This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
In another blog post I looked at different information exchange formats. In that post I mentioned that the world was standardizing on XML. But which form of XML? XML can come in many, many different forms.
I took a small data set which I had in a database and generated XML from that data set. The data set is simple enough: the population of each U.S. state. This PDF shows what the data set looks like in a rendered format.
Simple enough, here are some XML which I generated from the same Microsoft Access database information:
So, what is the point here? Well actually, I have several points which I will list and discuss.
Which form of XML is the best? Well, that all depends on what you need from the information all things considered. On the one extreme, if you just want to make a simple set of information available to a small group of people, any old XML will do. In fact, you could use pretty much any data format. But XML works well over the Web, it is in vogue, it is a good general format.
If you are, say, a government agency or other enterprise and you want to work with one data set and you don't need to exchange that information with other government agencies and you will only have one data format, traditional XML could work for you. But what if you want to verify that numeric information adds up correctly? Well, you could build your own validation mechanism because your data set is small and you don't have complex computations.
But how many government agencies or other enterprises don't have to interact with other government agencies or enterprises, subsidiaries, etc? If you interact with others, you have to agree. To agree, you need some sort of framework to agree on. For example, the National Information Exchange Model (NIEM) is a framework to help government agencies involved with public safety and security to create XML which is easier to share. The framework adds discipline to creating their XML formats. Rather than each agency creating point solutions to exchanging information; the framework provides the discipline needed to create a canonical standard format which makes exchanging information easier. (Their introduction document does a great job of explaining this.)
XBRL is also a framework for agreement. For example, the US GAAP Taxonomy Architecture is part of a framework for using XBRL in a specific way, creating what amounts to an application profile (i.e. no XBRL tuples, no XBRL typed dimensions, no use of the XBRL scenario context element, build [Table]s in a specific way, etc.) Also, the XBRL framework provides mechanisms for achieving things which are commonly needed in business reporting. For example, it provides the ability to: add labels, add multiple labels, express computations between numeric information, express additional types of relations between concepts, etc. If you need this and you are using XML, you would have to build these things yourself.
Sharing information to a large number of users is one thing. While a framework helps make these systems work better, what if you want to connect information between all these systems? Some people using traditional XML, some using XBRL, some using other formats. That is what RDF/OWL and the Semantic Web are all about. For example, this Data.gov project has converted numerous data sets into RDF/OWL. (This is a great book for understanding how the Semantic Web will be changing your life.)
The bottom line here as I see it is this: When you build your information exchange systems, be sure you are considering the right things for the long term. I see four groups of XML:
This is not to say that one type of XML is better than another, it is more about understanding what you need to be considering when you try and determine your needs. Using the wrong type of XML is like trying to fit a square peg in a round hole. You can do it, but it pretty.