I have wanted to understand things like the semantic web, RDF, OWL and how these things fit with XBRL (or how XBRL fits with these things) for some time. Someone recommended the book Semantic Web for Dummies, so I pieced that up and am making my way through it. I certainly don't have all this figured out yet, but there are some things which are very clear.
So, the Semantic Web is basically a "database in the sky". It is super-metadata. It allows data stored in different formats and systems to all "look" like one database, one system. This enables easy access to the information using web standards. It enables reuse of information, allowing anyone to "remix" the data.
Semantic Web (capital) and semantic web (lower case)
The "Semantic Web" and "semantic webs" are different things. This is just like the difference between the Internet, intranets, and extranets. "The Semantic Web" (upper case) will be sitting out there in cyberspace just like the Web is sitting there and available for all to use. The Web and the Semantic Web will co-exist and they serve different purposes really.
Companies will also have "semantic webs" (lower case). These will be private, for internal use only by employees of a company, like an intranet. Companies will participate in other "semantic webs" with suppliers and customers, much like an extranet.
EDGAR as compared to IDEA
The SEC EDGAR system has been described as"one of the federal government's most valuable and important databases". However, EDGAR does not fit into the Semantic Web or even into semantic webs. The best that the EDGAR system can do is get you to documents which relate to a company. The database cannot get you INSIDE the document, to get to the information from the document. EDGAR is a big filing cabinet. There is some value in providing semantic information to get you to the EDGAR filings, but that will leave people wanting. For example, Edgar Online spends probably millions of dollars writing parsing algorithms to get what people really want, the information in the documents.
Now IDEAwill be part of the Semantic Web. IDEA is a database (or will be once filings start coming in starting in June/July 2009). You can get inside the documents, to the information reported by companies. If EDGAR was valuable (which I believe it is); IDEA will be killer!
XBRL and the Semantic Web
So, what does XBRL provide to the Semantic Web or to semantic webs?
- XBRL is a database. Or maybe it is better said that XBRL is more like "rows in a database". What XBRL provides is a way to articulate information which you can extract and use. Unlike the SGML or HTML documents of EDGAR which really cannot be parsed cost effectively or where much of the information can be reused (i.e. thus the need for IDEA); the information in IDEA will be very usable. If you build a Semantic Web interface into EDGAR, you don't get much. If you build a Semantic Web interface into IDEA, much more is possible. So, XBRL provides the format of the information within the IDEA system, something that IDEA and systems like that can use to expose information to the Semantic Web or semantic webs.
- XBRL is metadata. You have to describe the information in those documents somehow. XBRL does that. XBRL allows for the communication of meaning. For example XBRL Formulas allows for the communication of rules, business rules. So, XBRL is useful in that way. I don't know how far RDF and/or OWL will get you in terms of expressing metadata at the level XBRL does. This is still a little vague to me. But I do know this. I am beginning to hear people talk about building some rules language for the semantic web, meaning such a language does not currently exist. Besides, Paul Warren, Gareth Reakes, and Alberto Massari pointed this out in 2003.
- XBRL is a transport protocol. XBRL is a "transport protocol". Companies need to get the information to the SEC some how. RDF, OWL, and the other stuff on the semantic web cannot do that. So, that is a function XBRL provides.
- XBRL is specific, important dictionaries. This probably should come under metadata. But it really is not about metadata itself, it is about the existance of actual specific metadata. I think XBRL brought the IFRS and US GAAP Taxonomies, the actual concepts, rules, and other metadata expressed for the financial reporting domain, into existance. Further, because IFRS is being used around the world (rather than 80 different sets of accounting standards in use), it becomes even more valuable.
That is what I see thus far.