Why XML is Bad for Humans

XML is everywhere these days. It is used for passing data around, for specifying metadata and even as a programming language for tools such as Ant and Jetty.

When XML is generated by various development and run-time tools (e.g., for serializing Java objects into SOAP), its complexity and readability don't matter much since humans have to deal with raw XML only occasionally (e.g., to troubleshoot a problem).

However, more often than not, XML is written directly by developers mostly with the help of a validating XML editor/IDE (that is, if developers are lucky and Schema/DTD are available). WSDL (in the case of WSDL-to-Java approach), XML schema and Ant build files are a just a few examples when this is the case.

Using XML as a mark-up language for otherwise mostly text documents (e.g., XHTML) it's not a totally bad idea. However, XML is ill-suited for specifying complex metadata which dynamic dependencies or for wiring command-based logic (e.g., Ant) or for defining domain-specific languages. That is, ill-suited for humans.

For starters, XML is unlike any other programming language (or a natural language). Consider a basic XML construct: <name>value</name>. In any other language it would've been written as "name=value" (or "name:=value", or something similar). An assignment is a construct familiar to most of us from math, even though we may not understand the intricacies of r-value versus l-value. It is intuitive. XML relegates this basic construct to attributes that can only be used as part of an element. Using attributes, a simple assignment can be expressed as <name value="my value" />, which is a bit easier to understand than a purely element-based construct. However, "value" attribute still seems kind of redundant.

Another annoying feature of XML is closing tags. Closing tags is what makes XML verbose. (What's interesting, SGML, which XML is derived from, did not require closing tags, so one could write something like <TAG/this/.) In most programming languages we express grouping and nesting using brackets or parenthesis or braces. This is true for function arguments, arrays, lists, maps/dictionaries, tuples, you name it, in any modern programming language. XML creators for some reason decided that repeating the name of a variable (tag) is the way to go. This is a great choice for XML parsers but a poor alternative when XML is written/read by humans.

Closing tags do help when the nesting level runs deep. But it does hurt in cases when there is a need to express a simple construct with just a few (or one!) data items. Problem is, our brain can only process limited number of items at a time, so intermixing data that needs to be processed with tags that serve as delimiters for this data makes comprehension more difficult. For simple lists, a comma-delimited format could be a better choice in many situations.

In general, repeating the same set of tags over and over again to define repetitive groups makes XML difficult to read, especially when each element contains just text:

    <welcome-file-list>
        <welcome-file>index.html</welcome-file>
        <welcome-file>index.htm</welcome-file>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

Compare this with a simple property/comma delimited format:


welcome-file-list= index.html, index.htm, index.jsp

Finally, what's up with angle brackets? I suppose, brackets could be justified when an element has multiple attributes. In many cases, however, elements don't have attributes and so an angle bracket is simply a way to distinguish a tag name from data. This is again, counter-intuitive and different from many modern programming language. Normally, variable names are not bracketed or quoted, instead, values are. Also, if there was a need to use a special symbol for denoting variables, wouldn't using "$" or "${}" be a more intuitive option for most of us?

Of course, XML has many advantages, the key one being that it is very easy to develop grammars for XML documents (using DTD or Schema). Another one is the fact that grammars are extensible via namespaces. Finally, any XML grammar can be parsed by any XML parser; to a parser an XML document is just a hierarchy of elements.

This simplicity, however, comes at great price. Expressiveness of XML is extremely limited. It only has a limited number of constructs and no operators. While it's adequate for its role as a markup language for text files, it puts a lot of constraints on any more-or-less complex metadata format, let alone something requiring procedural logic, such as Ant or XSLT. As a result, intuitiveness of XML-based grammars suffers.

I'm not saying that we must stop using XML altogether. It has its place. But we should not be applying it blindly just because it's the only widely available tool for creating domain-specific languages. For starters, BNF/EBNF, should be part of any developer's arsenal (along with ANTLR). And good old name/value pair and comma-delimited formats should be seriously considered for simple situations that do not require support for hierarchical structures.

Latest Images

Trending Articles

Latest Images