Many Different Data Formats, Why XML?
This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
If you have been working with computers as long as I have you have likely run across many different data formats. Here is a list of the common data exchange formats (other than database file formats):
- CSV (Comma Separated Values): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-csv.txt
- Fixed Width: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-fixedWidth.txt
- DIF (Data Interchange Format): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-dif.txt
- PRN (Printer Information): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-prn.txt
- ASCII Text: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-text.txt
- Plain Text: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-plainText.txt
- Excel (Binary): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-excel.xls
- Excel XML: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-excel.xml
- Word (Binary): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-word.doc
- PDF: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-pdf.pdf
- RTF (Rich Text Format): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-rtf.txt
- HTML: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-html.html
- HTML (view text): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-html.txt
- XML (Traditional): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-traditionalXML.xml
- XBRL: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-xbrl.xml
- RDF/OWL (Draft, needs work): http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-rdf_owl.xml
- JSON: http://www.xbrlsite.com/Demos/StateFactBook/DifferentDataFormats/DataFormat-json.txt
Each of those files expresses the same information in a different way. Another way of saying this is that each file format has a different syntax. There are lots of other file formats (see http://www.fileinfo.com/filetypes/data). Imagine having to write software to parse and use these different data formats.
To make a very long story short, XML is becoming quite popular. XML is a meta language, a language for building languages. XML is a syntax. The XML syntax is becoming very popular for the following reasons:
- It works well over the Web.
- It is platform independent. (Meaning, it is just a text file, pretty much any computer can read XML.)
- It can express complex information structures well. (For example, CSV cannot be used to expess a hierarchy.)
- It is readable by both humans and by computers.
- It is self-describing. (Meaning, the information which describe the information, called metadata, is available with the information. A contra example is that there is no standard way to describe the contents of say a a CSV file.)
- There are lots of free or low-cost XML parsers available. There are many XML editors available. There are many XML schema editors available.
- XML is easy to create by hand or generated by a computer application.
The Chamber of Commerce has an excellent explanation of XML. On the one hand, every file format has its pros and cons. (For example, JSON is far less verbose than XML and easy for many Java developers to use.) On the other hand, agreeing on one format has its advantages.
Bottom line: Lots of people are agreeing on XML. This saves time and money.
Reader Comments