Many Different Forms of RDF
Thursday, April 22, 2010 at 09:12AM
Charlie in Modeling Business Information Using XBRL, OWL, RDF, US GAAP Taxonomy, XBRL and the Semantic Web, RDF/OWL, XML, XBRL or RDF/OWL?

This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.

I have posted a number of blog entries relating to RDF, OWL, and the Semantic Web which you can find here. I want to summarize what I have figured out with regard to RDF here.

RDF (Resource Description Framework) is one of the cornerstones of the Semantic Web. RDF can be used to document pretty much anything.  The core to RDF seems to be the subject-predicate-object relation which was it seems used by Aristotle. This is what RDF looks like in one form, XML.  I am not going to explain RDF in any more detail, go look at the other blog posts for that.

What I do want to document is the forms of RDF:

So, what the heck does all this mean.  Let me try and explain.  I will use a small data set which I have created to explain.  Browse through these different data sets which I found on the web.  I grabbed these data sets, I decided to grab 20 different sets of data. Imagine you had the following data sets:

  1. When states entered the union.
  2. State violent crime statistics.
  3. Miscellaneous population statistics by state.
  4. State capitals and largest cities.
  5. Population estimates by state. (This is the specific CSV file which will open in Excel.)
  6. Financial information by state. (This is the specific Excel file.)
  7. State areas.
  8. State symbols.
  9. State mottoes.
  10. State nicknames.
  11. Origin of state names.
  12. State GDP.
  13. State GDP per capita.
  14. State population density.
  15. State tax revenues.
  16. State unemployment rates.
  17. Gross state product per capita.
  18. State by most educated.
  19. State by health index.
  20. State by personal income.
  21. (Extra) Red, Blue, and Purple states

Suppose you wanted to use the data in one of those data sets, what would you do? Copy and paste into Excel most likely.  What if you wanted to use two of those data sets together. No problem, just copy and paste both sets into Excel and put them together.  When you try and do something like this you run into problems such as the key value (i.e. in this case the state name probably) could be different.  For example, this list uses the state abbreviation, not the state name.  Now, this is not a huge deal if you don't need this information on a timely basis, or if you have small sets of data like the 50 states, etc.

So what if this information was in XML like this data set of state population. It would be pretty easy to write a simple Excel macro to go get the data. But what if each set of data used a different XML syntax?  See this blog post on different XML formats. OK, so not a huge problem, just write multiple import Excel macros, one for each XML file.  Right?  Well, that will get old.

OK, so what if everyone used the SAME XML format?  Say, RDF.  Well, then you could read the RDF by just pointing an application at the file, right?  Not quite.  What if the RDF used different logical models (or ontologies) to describe the data? If that happens, well, then you are back to mapping one file at a time, adjusting the multiple logical models or ontologies into one common model. You can do this, but it is a lot of work.

But what if there were another way?  What if you created one standard logical model, documented in using OWL, and then made every piece of data available in a common format.  Check out this Data-gov Wiki. Look at this web site, or wiki.  More specially, look at this complete data set of RDF.  Per the web site, they have converted about 280 data sets into RDF.

OK, so what is the bottom line here with regard to RDF.  First, the Semantic Web is about making information on the web more readable to computers.  To do that, the best way is to have one data format (semantics and syntax).  Short of that, one can take the many different data formats and map them to one syntax. You have to be sure the semantics (the meaning) of the data is consistent.  Much of the data needs to work together. Most may never be used together, but come like the state information I pointed out, will be used together. XML is a syntax that pretty much most people on the web are moving to, so RDF in XML makes sense.  You need OWL to articulate your ontology, or your model, so people both understand your model and data made available complies with that model.

But my next question is when should XBRL be used and when should RDF/OWL be used?

Article originally appeared on XBRL-based structured digital financial reporting (http://xbrl.squarespace.com/).
See website for complete article licensing information.