Binary XML vs. Excel

In this post I'd like to point out some ideas regarding Binary XML and Excel.

The W3C's XML Binary Characterization Working Group started in the summer of 2003 and its first activity sparked protest from XML experts around the world. The Binary XML concept has been discussed before, exchanging ideas why a less verbose XML is necessary.

One year ago the XBC Working Group published their documents and continued their efforts at the Efficient XML Interchange Working Group, sadly without gaining too much interest. The problem is, Binary XML would be a whole new set of specifications, creating new problems: it would be humanly unreadable, new and updated tools, parsers and editors would be necessary, and a new set of agreements would have to be made for e.g. well-formedness validations.

Why do we invent a Binary XML when we already have a perfectly useable alternative: Microsoft Excel?

Excel is used for all kinds of purposes, even many it was not designed for: requirements, structured specifications, messages, issue tracking, timesheets, project management, timeseries data and -- of course, calculations. That makes Excel as much 'general purpose' as XML is.

What's more, it is already used by countless numbers of users worldwide. Nowadays non-Microsoft tools are able to take care of the Excel file format, so the usefulness of Excel spreadsheets is extended beyond the Microsoft Windows platform.

An Excel sheet is a binary format which simple human beings know how to deal with: we can open those spreadsheet files in our favourite office suite and look, analyse and modify what's in there. It's even more user-friendly than XML itself! No more tag balancing, no more character escaping! Spreadsheets lend themselves perfectly for mixed human-computer message exchange. For instance, a questionnaire or a timesheet could be provided as an Excel template, filled in by a user and processed by an administrative system, using the user's favourite Spreadsheet application and software like Jakarta POI.

It's a shame that most spreadsheets are only used by users, and almost never processed by software, which is a very real and practical option...

When will businesses go all the way and use spreadsheets to implement c2b or b2b messaging?

Comments (8)

  1. Silvester Van der Bijl - Reply

    August 2, 2006 at 1:52 am

    Unfortunately, hardened Excel users have the tendency to move cells around making it impossible to extract information through code. Other issues are: differences between versions, user locales, etc.

    Microsoft itself apparently experienced the same problems, so newer Excel versions have the capability to export the data to XML (optionally with an XSD) 😉

    The POI libraries have some annoying issues which usually don't show up until it's too late to swith to a structured markup (maximum number of rows for starters), and it doesn't seem to be in active development anymore.

    In short I don't think it's as practical as you suggest it is to use Excel as an interchangable format.

  2. Bart Guijt - Reply

    August 2, 2006 at 11:10 am

    To a certain extend, you are right: users moving cells around kind of screw up the processing software.

    However, just like you can validate an XML file using one of its validation mechanisms (DTD, Schema, RELAX-NG etc) we can apply protected cells and macros in spreadsheets te refrain users from doing nasty things.

    I haven't seen much activity from the POI group myself, either. but according to their dev-list (http://marc.theaimsgroup.com/?l=poi-dev&r=1&w=2) they are alive and kicking 😉

  3. Silvester Van der Bijl - Reply

    August 2, 2006 at 11:46 am

    Yes you can protect Excel cells, but then we would also have to know in advance how many rows of data to expect from e.g. a table in a sheet. The user cannot add rows, since the cells are protected.

    Have you also looked at using a combination of the two? You can add an XSD to an Excel sheet, allowing the user to enter data, move cells, whatever. Excel remembers the mapping to the XSD, and allows for export to XML.

  4. Bart Guijt - Reply

    August 2, 2006 at 1:20 pm

    Nope, I am not familiar with the XSD feature in Excel. I'll check it, see what it can do.

  5. Wilfred Springer - Reply

    August 20, 2006 at 5:58 pm

    ....I just hope you're kidding. FastInfoset allows you to continue to use the existing API's, like STaX, SAX and DOM, so it's not that intrusive at all.

    http://agilejava.com/blog/?p=19

  6. Bart Guijt - Reply

    September 6, 2006 at 3:17 pm

    Of course I'm not kidding 😉

    The point is the following: XML is popular because it has the following characteristics:
    1) simple;
    2) human-readable/editable;
    3) machine readable/editable.

    Binary XML, or Fast Infoset, is not popular, neither is it simple nor human-readable - but Excel documents (or in general, spreadsheets) are!

    (note: I qualify a spreadsheet program to be at the same ubiquity level as a regular text editor to edit a document)

  7. Bilal - Reply

    September 8, 2007 at 1:18 pm

    i just want to know that when formats such as 'Excel' and 'Text File (with comma delimited)' are already available for data exchange and representation, then why use XML over these standards?

  8. John Taylor - Reply

    April 24, 2009 at 12:24 am

    I found your blog on Google. I've bookmarked it and will watch out for your next blog post.

Add a Comment