<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xebia Blog &#187; XML Schema</title>
	<atom:link href="http://blog.xebia.com/tag/xml-schema/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.xebia.com</link>
	<description>Software development done right!</description>
	<lastBuildDate>Wed, 01 Feb 2012 00:30:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The XML Instance Gamut</title>
		<link>http://blog.xebia.com/2009/10/19/the-xml-instance-gamut/</link>
		<comments>http://blog.xebia.com/2009/10/19/the-xml-instance-gamut/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 05:48:48 +0000</pubDate>
		<dc:creator>Wilfred Springer</dc:creator>
		<br />
<b>Warning</b>:  Invalid argument supplied for foreach() in <b>/home/blog.xebia.com/www/wp-content/plugins/autometa/autometa.php</b> on line <b>303</b><br />
		<category><![CDATA[Java]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XML Schema]]></category>

	<!-- AutoMeta Start -->
	<!-- AutoMeta End -->
	
		<guid isPermaLink="false">http://blog.xebia.com/?p=3250</guid>
		<description><![CDATA[If you happen to be in the business of writing software serving XML documents or consuming XML documents &#8211; and if you read this post, then there is a fair chance you are &#8211; then there is always one big challenge: how do you make sure your service or client is capable of dealing with [...]]]></description>
			<content:encoded><![CDATA[<p>If you happen to be in the business of writing software serving XML documents or consuming XML documents &#8211; and if you read this post, then there is a fair chance you are &#8211; then there is always one big challenge: how do you make sure your service or client is capable of dealing with <em>all</em> of the XML documents you could possibly expect to be passed around?</p>
<p>And if you happen to come from the test-driven world, the answer is obviously: <em>by testing it</em>. However, if you try to do that, things might be harder than you expect at first.</p>
<p><strong>What about schemas?</strong></p>
<p><span id="more-3250"></span></p>
<p>I clearly remember having to integrate with Google&#8217;s Local Search Service. We managed to get them send us their schema, but the schema was merely illustrative, rather than normative. In fact, it didn&#8217;t even &#8216;parse&#8217; correctly. It was supposed to be a DTD, but in reality, it wasn&#8217;t. In that case, you are basically lost. The only thing that you can really do is &#8216;test by poking around&#8217;, trying to see what the web service is going to reply, and then work into your test harness.</p>
<p>If you <em>do</em> however manage to get a schema, then you are still not done yet. Sure, if it&#8217;s about SOAP based web services, then you might be able to generate stubs and skeletons, and those stubs and skeletons would give you some guarantee that you are covering most cases. But then there is still a chance that you would not cover for all cases, since &#8211; inside your XML document &#8211; there might be alternatives for content models, and you might &#8211; when you would implement your service &#8211; only be dealing with one of them.</p>
<p>If the schema is small, then you can probably figure it out by <em>careful examination</em>. However, if the schema is <em>huge</em>, then the range and variety of XML document instances that you might get will make that impossible. And even if you created the schema yourself, it might sometimes cover for a wider range of options than you expected. (I&#8217;m sure, I am not the only one who experienced this. <img src='http://blog.xebia.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> )</p>
<p><strong>XML Instance Generator to the rescue</strong></p>
<p>So, back to test-driven. The good news is, there are tools that take a schema, and generate random instances, basically walking all of the different options. <a href="http://www.kohsuke.org/#java">Xmlgen</a> is one of those tools. It&#8217;s a little bit hard to find these days. If you follow the &#8216;XML Instance Generator&#8217; link on Kohsuke&#8217;s homepage, you will end up in no-mans land. I dug a little further, and found out it&#8217;s currently hosted at Sun&#8217;s d<a href="https://msv.dev.java.net/">ev.java.net.</a></p>
<p>Xmlgen is extremely simple. It takes a schema (any schema language), and will generate any number of sample documents from that. It&#8217;s exactly what you want, except… It doesn&#8217;t support all datatypes defined by the <a href="http://www.w3.org/TR/xmlschema-2/">XML Schema Datatypes specification</a>. And that&#8217;s something I ran into more often before.</p>
<p>In fact, I tried to use xmlgen before on a couple of occasions, and each time it broke on missing support for xs:dateTime or <a href="http://www.w3.org/TR/xmlschema-2/#rf-pattern">xs:pattern</a> restrictions. And there doesn&#8217;t seem to be an aweful lot of work going into xmlgen to fix that.</p>
<p><strong>Fixing XML Instance Generator</strong></p>
<p>So I figured I&#8217;d fix this myself. It turned out adding support for dateTime wasn&#8217;t all that hard, even though xmlgen does not really have extensions points to implement, so you&#8217;re basically left with a) hacking the source code big time, or b) hacking it just a little, in order to add plugpoints and then have something else implementing that plugpoint &#8211; which is what I did.</p>
<p><strong>Whoops, xs:pattern</strong></p>
<p>Adding support for xs:pattern turned out to be a little tricky. If you are new to this type of restriction, then you should know that it is about restricting content to fit a certain regular expression, as illustrated below.</p>
<pre class="brush: xml; title: ; notranslate">
&lt;simpleType name='better-us-zipcode'&gt;
&lt;restriction base='string'&gt;
&lt;pattern value='[0-9]{5}(-[0-9]{4})?'/&gt;
&lt;/restriction&gt;
&lt;/simpleType&gt;
</pre>
<p>Now, if you would have the desire to generate valid data for this restriction, then you should be able to generate text from that regular expression. It turns out there are quite a few Java libraries out there capable of <em>matching</em> text, but there nothing at all for <em>generating</em> text. So I implemented my own. I blogged about it <a href="http://blog.flotsam.nl/2009/10/xeger-has-arrived.html">here</a>, and it is hosted <a href="http://code.google.com/p/xeger">here</a>.</p>
<p>Once that was done, extending xmlgen to have support for xs:pattern restrictions was easy. That means that &#8211; with just a few changes &#8211; I am now able to generate a test set for a fairly complicated schema. And I&#8217;m pretty sure that it will cover all cases, as long as I make the number of instance documents big enough.</p>
<p>So, now for a restriction like this:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsd:simpleType name = &quot;TimeValue&quot;&gt;
&lt;xsd:restriction base = &quot;xsd:string&quot;&gt;
&lt;xsd:pattern value = &quot;[0-2][0-9]\:[0-5][0-9](\:[0-5][0-9])?&quot;/&gt;
&lt;/xsd:restriction&gt;
&lt;/xsd:simpleType&gt;
</pre>
<p>… it will generate instances like this:</p>
<ul>
<li>07:36</li>
<li>10:16:26</li>
<li>etc.</li>
</ul>
<p>You can download the modified version of xmlgen <a href="http://blog.xebia.com/?attachment_id=3252">here</a>.</p>
<div name="googleone_share_1" style="position:relative;z-index:5;float: right; margin-left: 10px;"><g:plusone size="small" count="1" href="http://blog.xebia.com/2009/10/19/the-xml-instance-gamut/"></g:plusone></div><p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fblog.xebia.com%2F2009%2F10%2F19%2Fthe-xml-instance-gamut%2F&amp;title=The%20XML%20Instance%20Gamut" id="wpa2a_2"><img src="http://blog.xebia.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.xebia.com/2009/10/19/the-xml-instance-gamut/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  blog.xebia.com/tag/xml-schema/feed/ ) in 0.54449 seconds, on Feb 9th, 2012 at 4:00 pm UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Feb 9th, 2012 at 5:00 pm UTC -->
