Experimentation with RDF, RDFS, OWL, and SPARQL
I have been doing some experimentation with RDF, RDFS, OWL, and SPARQL. I have not been able to create a lot so far; the learning curve is rather flat now. What I have created is interesting from a number of perspectives.
I am working with this randomly selected SEC XBRL financial filing. What I want to do is two things. First, generated the model structure of the filing in RDF and second, validate the model structure against an OWL ontology (or RDFS) to see if is possible.
To start, consider a few things. This is the presentation linkbase of that filing. That is expressed in XBRL using XLink. It is VERY hard to work with this raw XBRL. No problem, send the XBRL file to an XBRL processor and generate an easier to use XML infoset. If you look at that XML infoset, you can start to understand the model. It is way easier to use that the raw XBRL/XLink. Next, I serialized that exact same information in RDF.
If you look at that you might say two things. First you might say, "Now wait a minute, that is harder to read than the XML infoset." And you would be right. The second thing you might see if you looked at this is the remarkable parallel between XLink and RDF.
All three of those technical syntaxes say EXACTLY the same thing: the XBRL presentation linkbase expressed in XLink, the XML infoset in just raw XML, and the RDF. EXACTLY the same thing.
However, there are HUGE differences between the three serializations.
- The XBRL presentation relations expressed in XLink can be validated. BUT, it can be validated only to the extent that the XBRL processor understands the information. All XBRL processors understand the XBRL syntax of course. HOWEVER, what XBRL processors do NOT understand is how the presentation relations should be structured, whether those presentation relations are consistent with the XBRL calculation relations and XBRL definition relations. Why? Well, because XBRL only has "parent-child" (http://www.xbrl.org/2003/arcrole/parent-child) type relations in the presentation linkbase. What does the parent-child relationship mean? What relations are allowed? You cannot express that in XBRL beyond "parent-child" and you therefore cannot validate that you are building the relations correctly or consistently across all linkbases with XBRL.
- The XML infoset relations are WAY clearer. They are WAY easier for a human to read, WAY easier for an XML parser to work with, and ALL the information you want to work with is there. (If you go back to the XBRL presentation relations you will note that you have to go grab information from the XSD file to have information about the report elements.) If this format is so much better, then why doesn't XBRL use this format? Well, because the XML infoset format is not extensible. That is WHY XBRL used XLink. But there is something else wrong with the XML infoset format. You still cannot tell if (a) the information expressed is CORRECT and (b) you cannot tell if the information is CONSISTENT with the XBRL calculations and XBRL definition relations. You can write a validator very, very easily to perform the tests to see if the relations are CORRECT; but, that is work.
- RDF (work in progress), I think, can solve the validation problem. I say "I think" because I have not actually gotten this to work yet. THAT is what I am trying to make work. (I did get this to validate per the W3C RDF validator.)
To do this, I first built a simple ontology using OWL. This is the ontology. If you look at this and criticize how bad this is right now, you are totally missing the point. Yes, it is bad. I don't understand how to best use OWL yet. (If YOU do, please rewrite the OWL ontology, sent it to me, and I won't have to spend the time figuring this stuff out. Please!) Not helping things is the fact that if you think XBRL is flexible and hard to use, you should try RDF, RDFS, OWL, and SPARQL!!! So why do I bother? Well, (a) because it is far easier for me and other business people to use RDF, RSFS, and OWL than it is to learn to program all this stuff but more importantly (b) there are WAY, WAY, WAY more things that I want to be able to validate.
To fiddle around with the RDF I am using Protege. (This is a web based version of Protege, works in Google Chrome, does not seem to work in Microsoft Internet Explorer) Now, Protege is not a business user tool. VERY hard to figure out. But again, it is worth it. Part of Protege is a SPARQL query tool. Here is my first SPARQL query (paste it into Protege, try it for yourself):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX model: <http://www.xbrlsite.com/2013/FinancialReportOntology/ReportElement.xml#>
SELECT ?subject ?object WHERE { ?subject model:hasAxis ?object }
Why am I doing all this? Folks, this is extremely powerful stuff! There is an absolute boatload of leverage which seems achievable from RDF, RDFS, OWL, and SPARQL. That is what the Financial Report Ontologyis all about. Other domains have created ontologies. Economics. Biomedical. Others.
Frankly, I don't get all the details of this stuff yet. But, then again, I did not get XBRL when I first started either. But, hundreds if not thousands of hours trying to figure this stuff out is paying off. I can tell you this...while it might be a lot of work for accountants to understand this technology stuff; it would be WAY more work for technology people to grasp accounting. Business users don't need to learn everything about the technology; just enough to communicate effectively to the technology people who do understand this stuff.
Reader Comments