<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xebia Blog &#187; generated keys</title>
	<atom:link href="http://blog.xebia.com/tag/generated-keys/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.xebia.com</link>
	<description>Software development done right!</description>
	<lastBuildDate>Wed, 01 Feb 2012 00:30:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>A quest for Generated Keys in Kettle</title>
		<link>http://blog.xebia.com/2009/08/28/a-quest-for-generated-keys-in-kettle/</link>
		<comments>http://blog.xebia.com/2009/08/28/a-quest-for-generated-keys-in-kettle/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 08:00:17 +0000</pubDate>
		<dc:creator>Maarten Winkels</dc:creator>
		<br />
<b>Warning</b>:  Invalid argument supplied for foreach() in <b>/home/blog.xebia.com/www/wp-content/plugins/autometa/autometa.php</b> on line <b>303</b><br />
		<category><![CDATA[General]]></category>
		<category><![CDATA[generated keys]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[hsqldb]]></category>
		<category><![CDATA[kettle]]></category>
		<category><![CDATA[postgresql]]></category>

	<!-- AutoMeta Start -->
	<!-- AutoMeta End -->
	
		<guid isPermaLink="false">http://blog.xebia.com/?p=3023</guid>
		<description><![CDATA[For my current project we use Kettle to process data from a number of sources and store it in a shared database. Kettle has great support for parsing data from a variety of sources, transforming it and writing it to a variety of destinations. One problem that often arises when inserting data in a relational [...]]]></description>
			<content:encoded><![CDATA[<p>For my current project we use <a href="http://kettle.pentaho.org/">Kettle</a> to process data from a number of sources and store it in a shared database. Kettle has great support for parsing data from a variety of sources, transforming it and writing it to a variety of destinations. One problem that often arises when inserting data in a relational database is the need for a syntactic, unique key that will be generated when a new row is inserted and later in the process used for other rows that refer the primary row. There are many solutions for this problem, both in the RDBMS and in the Java space. This blog reports on a search through several code bases to come up with a good solution in kettle which, unfortunately, still evades me.<br />
<span id="more-3023"></span></p>
<p>A complexity that adds to the problem is that it should work both in the production setup as well as in our kettle unit tests. For production we use <a href="http://www.postgresql.org">Postgresql</a> and for testing we use <a href="http://www.hsqldb.org">HsqlDB</a>. In it self, this is a challenge with Kettle, but by using a generic database connector and injecting the JDBC properties through variables this is feasible.</p>
<p>Both Postgresql and HsqlDB have (some form) of support for generated keys:<br />
<b>Postgresql</b> To create an auto-increment column in Postgresql you use the <a href="http://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-SERIAL">(big)serial datatype</a>. This will generate a simple numeric column owning a sequence that is used for its default value. The name of the sequence is <i>tablename</i>_<i>columnname</i>_seq.<br />
<b>HsqlDB</b> An auto-increment using the following column definition: <span style="font-family: monospace">&#8230; generated by default as identity (start with 1)</span>.<br />
When a row is inserted that does not specify a value for the auto-increment column, its value is generated, so far so good.</p>
<p>The problem arises when we need the generated value for the just inserted row. And thus the quest begins&#8230;</p>
<p><b>JDBC3 &#8212; Generated Keys</b></p>
<p>From the surface, this should be an easy problem, since JDBC (since v.3) has support for generated keys, through several interfaces:</p>
<ul>
<li><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int)"><span style="font-family: monospace">Connection.prepareStatement(String sql, int autoGeneratedKeys)</span></a>
<li><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#getGeneratedKeys()"><span style="font-family: monospace">Statement.getGeneratedKeys()</span></a>
</ul>
<p>Kettle&#8217;s <a href="http://wiki.pentaho.com/display/EAI/Table+Output">TableOutput step</a> uses this feature to return the generated keys for each row inserted.<br />
Unfortunately neither HsqlDB (latest version 1.8.0.10) nor Postgres (latest version 8.4) support these interfaces.</p>
<p><b>Hibernate &#8212; Inspiration</b></p>
<p>In the same project we use <a href="http://www.hibernate.org">Hibernate</a> to generate the schema and to work with the data that has been gathered and integrated. Hibernate handles the situation quite gracefully. Using the <span style="font-family: monospace">@GeneratedValue(strategy=IDENTITY)</span> it generated the correct schema for both HsqlDB and Postgresql through its specialized dialects.<br />
To fetch the generated value, it uses the <span style="font-family: monospace">identity()</span> function on HsqlDB and it queries Postgresql for the current value of the sequence. This works perfectly, although in highly concurrent situation one might expect problems.<br />
Unfortunately, on Kettle this wont work, because separating the insert statement and the query for the generated key in two steps, will make queries run in separate threads. This is how Kettle works. To be able to fetch the last sequence value for each row, it has to happen within the same step.</p>
<p><b>Postgresql specific &#8212; Inline solutions</b></p>
<p>So, does Postgres, being a cutting edge database not support a feature that allows you to generate an id and then use it? Isn&#8217;t this something that is quite useful? Of course it does! It uses a custom syntax using the following statement:<br />
<span style="font-family: monospace">INSERT INTO table (col1, col2, &#8230;) VALUES (&#8230;) RETURNING col0</span><br />
Thus you instruct Postgres to return the value generated for the auto-increment column in a ResultSet. This statement should thus be executed using the <span style="font-family: monospace">executeQuery()</span> method in stead of the <span style="font-family: monospace">executeUpdate()</span> method and the TableOutput step has no support for this. The implementation of this step is on the whole quite rigid: there are no points to specify and custom SQL to execute.<br />
And of course HsqlDB does not support this syntax, so we would have to write some logic to determine what to execute when&#8230;</p>
<p><b>Query the sequence before inserting</b></p>
<p>This approach will query the database sequence in a separate step <b>before</b> inserting the data in the table. It works fine (for postgresql) although it bypasses the sequence linked to the column. We have to manually make sure to call out to the correct sequence to keep the two in synch.<br />
On HsqlDB, a sequence is not automatically created for an identity column. We would have to manually add the sequences to the schema. Also the syntax for querying a sequence is different on HsqlDB.<br />
The problem with this approach is that Kettle does not support sequences in the Generic Database adapter. To implement this solution we would have to enhance the generic database adapter to support sequences and also make it flexible enough to work on both databases.</p>
<p><b>Conclusion</b></p>
<p>Although this seems like a simple and common problem, a good solution in Kettle does not seem to exist. The quest continues&#8230;</p>
<div name="googleone_share_1" style="position:relative;z-index:5;float: right; margin-left: 10px;"><g:plusone size="small" count="1" href="http://blog.xebia.com/2009/08/28/a-quest-for-generated-keys-in-kettle/"></g:plusone></div><p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fblog.xebia.com%2F2009%2F08%2F28%2Fa-quest-for-generated-keys-in-kettle%2F&amp;title=A%20quest%20for%20Generated%20Keys%20in%20Kettle" id="wpa2a_2"><img src="http://blog.xebia.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.xebia.com/2009/08/28/a-quest-for-generated-keys-in-kettle/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  blog.xebia.com/tag/generated-keys/feed/ ) in 0.43855 seconds, on Feb 9th, 2012 at 5:09 pm UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Feb 9th, 2012 at 6:09 pm UTC -->
