This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
I have posted a number of blog entries relating to RDF, OWL, and the Semantic Web which you can find here. I want to summarize what I have figured out with regard to RDF here.
RDF (Resource Description Framework) is one of the cornerstones of the Semantic Web. RDF can be used to document pretty much anything. The core to RDF seems to be the subject-predicate-object relation which was it seems used by Aristotle. This is what RDF looks like in one form, XML. I am not going to explain RDF in any more detail, go look at the other blog posts for that.
What I do want to document is the forms of RDF:
- Triples: There are lots of terms for subject-predicate-object relations. Here are some of those notations (syntax): N3, N-Triples, TRiG, TRiX, Turtle, RDF/XML, RDFa. Each of these probably has their pros and cons. The point here is that this is for the most part many different ways of doing the same thing.
- RDF XML: So because I want to stick with XML, I will focus on RDF XML as "the" format for RDF for my purposes. The format really does not matter, what matters is what I talk about below. Using RDF alone is like using XML without a schema. You can basically include anything, right or wrong.
- RDFa: RDFa is an approach to embedding metadata into HTML web pages. Something similar to this is eRDF. RDFa and eRDF are similar to iXBRL.
- RDF plus OWL: Web Ontology Language (called OWL) can be thought of as a schema for RDF, loosely similar to how XML Schema constrains XML. But, OWL is much different in that it is used to constrain semantics, not syntax. What this means is that RDF by itself seems somewhat useless really. You have to both make sure you build your RDF relations correctly and you understand those relations. That is what OWL seems to do. OWL defines a semantic model which both explains the RDF and constrains the RDF.
- RDF and standard OWL ontologies: The next step in the spectrum is what I am calling standard OWL ontologies. It is one thing for someone to post an ontology to the web. You could have hundreds or thousands of ontologies which express the same thing. The ontologies could have different logical models and not even interoperate. As compared to having one agreed to ontology for some specific model.
So, what the heck does all this mean. Let me try and explain. I will use a small data set which I have created to explain. Browse through these different data sets which I found on the web. I grabbed these data sets, I decided to grab 20 different sets of data. Imagine you had the following data sets:
- When states entered the union.
- State violent crime statistics.
- Miscellaneous population statistics by state.
- State capitals and largest cities.
- Population estimates by state. (This is the specific CSV file which will open in Excel.)
- Financial information by state. (This is the specific Excel file.)
- State areas.
- State symbols.
- State mottoes.
- State nicknames.
- Origin of state names.
- State GDP.
- State GDP per capita.
- State population density.
- State tax revenues.
- State unemployment rates.
- Gross state product per capita.
- State by most educated.
- State by health index.
- State by personal income.
- (Extra) Red, Blue, and Purple states
Suppose you wanted to use the data in one of those data sets, what would you do? Copy and paste into Excel most likely. What if you wanted to use two of those data sets together. No problem, just copy and paste both sets into Excel and put them together. When you try and do something like this you run into problems such as the key value (i.e. in this case the state name probably) could be different. For example, this list uses the state abbreviation, not the state name. Now, this is not a huge deal if you don't need this information on a timely basis, or if you have small sets of data like the 50 states, etc.
So what if this information was in XML like this data set of state population. It would be pretty easy to write a simple Excel macro to go get the data. But what if each set of data used a different XML syntax? See this blog post on different XML formats. OK, so not a huge problem, just write multiple import Excel macros, one for each XML file. Right? Well, that will get old.
OK, so what if everyone used the SAME XML format? Say, RDF. Well, then you could read the RDF by just pointing an application at the file, right? Not quite. What if the RDF used different logical models (or ontologies) to describe the data? If that happens, well, then you are back to mapping one file at a time, adjusting the multiple logical models or ontologies into one common model. You can do this, but it is a lot of work.
But what if there were another way? What if you created one standard logical model, documented in using OWL, and then made every piece of data available in a common format. Check out this Data-gov Wiki. Look at this web site, or wiki. More specially, look at this complete data set of RDF. Per the web site, they have converted about 280 data sets into RDF.
OK, so what is the bottom line here with regard to RDF. First, the Semantic Web is about making information on the web more readable to computers. To do that, the best way is to have one data format (semantics and syntax). Short of that, one can take the many different data formats and map them to one syntax. You have to be sure the semantics (the meaning) of the data is consistent. Much of the data needs to work together. Most may never be used together, but come like the state information I pointed out, will be used together. XML is a syntax that pretty much most people on the web are moving to, so RDF in XML makes sense. You need OWL to articulate your ontology, or your model, so people both understand your model and data made available complies with that model.
But my next question is when should XBRL be used and when should RDF/OWL be used?