BLOG: Digital Financial Reporting
This is a blog for information relating to digital financial reporting. This blog is basically my "lab notebook" for experimenting and learning about XBRL-based digital financial reporting. This is my brain storming platform. This is where I think out loud (i.e. publicly) about digital financial reporting. This information is for innovators and early adopters who are ushering in a new era of accounting, reporting, auditing, and analysis in a digital environment.
Much of the information contained in this blog is synthasized, summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the chapters of Intelligent XBRL-based Digital Financial Reporting. If you have any questions, feel free to contact me.
Entries from April 1, 2010 - April 30, 2010
Many Different Forms of RDF
This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
I have posted a number of blog entries relating to RDF, OWL, and the Semantic Web which you can find here. I want to summarize what I have figured out with regard to RDF here.
RDF (Resource Description Framework) is one of the cornerstones of the Semantic Web. RDF can be used to document pretty much anything. The core to RDF seems to be the subject-predicate-object relation which was it seems used by Aristotle. This is what RDF looks like in one form, XML. I am not going to explain RDF in any more detail, go look at the other blog posts for that.
What I do want to document is the forms of RDF:
- Triples: There are lots of terms for subject-predicate-object relations. Here are some of those notations (syntax): N3, N-Triples, TRiG, TRiX, Turtle, RDF/XML, RDFa. Each of these probably has their pros and cons. The point here is that this is for the most part many different ways of doing the same thing.
- RDF XML: So because I want to stick with XML, I will focus on RDF XML as "the" format for RDF for my purposes. The format really does not matter, what matters is what I talk about below. Using RDF alone is like using XML without a schema. You can basically include anything, right or wrong.
- RDFa: RDFa is an approach to embedding metadata into HTML web pages. Something similar to this is eRDF. RDFa and eRDF are similar to iXBRL.
- RDF plus OWL: Web Ontology Language (called OWL) can be thought of as a schema for RDF, loosely similar to how XML Schema constrains XML. But, OWL is much different in that it is used to constrain semantics, not syntax. What this means is that RDF by itself seems somewhat useless really. You have to both make sure you build your RDF relations correctly and you understand those relations. That is what OWL seems to do. OWL defines a semantic model which both explains the RDF and constrains the RDF.
- RDF and standard OWL ontologies: The next step in the spectrum is what I am calling standard OWL ontologies. It is one thing for someone to post an ontology to the web. You could have hundreds or thousands of ontologies which express the same thing. The ontologies could have different logical models and not even interoperate. As compared to having one agreed to ontology for some specific model.
So, what the heck does all this mean. Let me try and explain. I will use a small data set which I have created to explain. Browse through these different data sets which I found on the web. I grabbed these data sets, I decided to grab 20 different sets of data. Imagine you had the following data sets:
- When states entered the union.
- State violent crime statistics.
- Miscellaneous population statistics by state.
- State capitals and largest cities.
- Population estimates by state. (This is the specific CSV file which will open in Excel.)
- Financial information by state. (This is the specific Excel file.)
- State areas.
- State symbols.
- State mottoes.
- State nicknames.
- Origin of state names.
- State GDP.
- State GDP per capita.
- State population density.
- State tax revenues.
- State unemployment rates.
- Gross state product per capita.
- State by most educated.
- State by health index.
- State by personal income.
- (Extra) Red, Blue, and Purple states
Suppose you wanted to use the data in one of those data sets, what would you do? Copy and paste into Excel most likely. What if you wanted to use two of those data sets together. No problem, just copy and paste both sets into Excel and put them together. When you try and do something like this you run into problems such as the key value (i.e. in this case the state name probably) could be different. For example, this list uses the state abbreviation, not the state name. Now, this is not a huge deal if you don't need this information on a timely basis, or if you have small sets of data like the 50 states, etc.
So what if this information was in XML like this data set of state population. It would be pretty easy to write a simple Excel macro to go get the data. But what if each set of data used a different XML syntax? See this blog post on different XML formats. OK, so not a huge problem, just write multiple import Excel macros, one for each XML file. Right? Well, that will get old.
OK, so what if everyone used the SAME XML format? Say, RDF. Well, then you could read the RDF by just pointing an application at the file, right? Not quite. What if the RDF used different logical models (or ontologies) to describe the data? If that happens, well, then you are back to mapping one file at a time, adjusting the multiple logical models or ontologies into one common model. You can do this, but it is a lot of work.
But what if there were another way? What if you created one standard logical model, documented in using OWL, and then made every piece of data available in a common format. Check out this Data-gov Wiki. Look at this web site, or wiki. More specially, look at this complete data set of RDF. Per the web site, they have converted about 280 data sets into RDF.
OK, so what is the bottom line here with regard to RDF. First, the Semantic Web is about making information on the web more readable to computers. To do that, the best way is to have one data format (semantics and syntax). Short of that, one can take the many different data formats and map them to one syntax. You have to be sure the semantics (the meaning) of the data is consistent. Much of the data needs to work together. Most may never be used together, but come like the state information I pointed out, will be used together. XML is a syntax that pretty much most people on the web are moving to, so RDF in XML makes sense. You need OWL to articulate your ontology, or your model, so people both understand your model and data made available complies with that model.
But my next question is when should XBRL be used and when should RDF/OWL be used?




A Paper: Quality of XBRL US GAAP Taxonomy: Empirical Evaluation using SEC Filings
A paper written by Hongwe Zhu and Harris Wu of Old Dominion University, Quality of XBRL US GAAP Taxonomy: Empirical Evaluation using SEC Filings, looks at the quality of the US GAAP Taxonomy. The following is an abstract of that paper:
The primary purpose of a data standard is to improve the comparability of data created by multiple standard users. Given the high cost of developing and implementing data standards, it is desirable to be able to assess the quality of data standards. We develop metrics for measuring completeness and relevancy of a data standard. These metrics are evaluated empirically using the US GAAP taxonomy in XBRL and SEC filings produced using the taxonomy by approximately 500 companies. The results show that the metrics are useful and effective. Our analysis also reveals quality issues of the GAAP taxonomy and provides useful feedback to the taxonomy users. The SEC has mandated that all publicly listed companies must submit their filings using XBRL beginning mid 2009 to late 2014 according to a phased-in schedule. Thus our findings are timely and have practical implications that will ultimately help improve the quality of financial data.
While the paper just scratches the surface (note that the authors refer to this as their initial work), it does offer helpful insight into using the US GAAP Taxonomy to create SEC XBRL Filings and using those filings.
Here is a summary of some of the more helpful and interesting findings of this analysis. I have added my commentary to this information using italics:
- The authors point out that from a syntactical perspecive, all the XBRL documents (XBRL instances and the filer XBRL taxonomy extensions) prepared by SEC XBRL filers are interoperable because they use the same syntax (i.e. XBRL). However from a semantic perspective (business meaning), the XBRL documents can be difficult to compare when different companies use different data elements in their document. (I would point out that an XBRL taxonomy contains two things: elements (better referred to as concepts really) and relations. Differences in not only concepts can cause comparability difficulties, but also differences in the relations.)
- The authors point out that from the perspective of the users of this information, the metadata (i.e. the taxonomy concepts and like I said above their relations to other concepts) are also data.
- The US GAAP Taxonomy has 10,537 "active concepts". (There are also 2,653 abstract concepts which can never be used to report information and 346 deprecated concepts which should not be used by SEC filers.)
- Many companies are reporting using the deprecated concepts. The authors point out that 195 out of 481 filers used deprecated concepts. They point out a list of the deprecated concepts. (It seems to me that the SEC should provide a submission test to not allow these deprecated concepts to be used. The US GAAP Taxonomy clearly identifies these concepts. Or, perhaps the US GAAP Taxonomy should simply remove these concepts rather than providing them and marking them deprecated. Either of these would solve this problem.)
- All SEC XBRL filings combined, a total of 2,558 US GAAP Taxonomy concepts were used and a total of 10,168 custom concepts were introduced by filers.
- The average filing contained approximately 125 concepts of which 109 were from the US GAAP Taxonomy and 16 were added by filers.
- A list of the top 50 most used US GAAP Taxonomy concepts is listed. (I have two points relating to that list. First, I am very currious why "Assets" is used 1229 times and "LiabilitiesAndStockholdersEquity" was used 1217 times. Seems to me they should be the same. Maybe this is because some of the filers are partnerships or something. Second, it seems to me that the frequency of the items on that list is a good indicator of comparability between filings.)
- A list of the top 50 custom concepts added by SEC XBRL filers is provided. Of that list of 50, the authors point out that 15 concepts have identical names as concepts which exist in the US GAAP Taxonomy. Of that list of 50, an additional 13 concepts added are very similar to US GAAP Taxonomy concepts. (I would point out that these lists are a nice little validation check which could help SEC XBRL filers not use duplicate concepts. These checks would be quite easy for a software application to implement and seems helpful to filers.)
- A comment was made by the authors, "The data shows that the lengthier-named elements are certainly less frequently used." (I would point out from my experience in creating the US GAAP Taxonomy that the lengthier concepts are in the disclosures and the SEC XBRL filers are not doing detailed tagging of the disclosures, therefore the lengthy concepts clearly would not be used at this point. The author also notes that only 2,653 of the existing 10,537 concepts were used. Wait until detailed tagging kicks in, the number of concepts used would grow.)
All in all this research provides useful information, but like the authors point out...there is a lot of opportunity for more research. Just like the SEC XBRL filers, the researchers will figure out the really interesting things to research when they have more experience with the taxonomy.
Also, I would point out that the researchers who did this analysis have an IT background, not an accounting or financial reporting background. So while this information is helpful, personally I think more accounting people need to be involved in this type of research. I have several blog posts which raise interesting accounting and financial reporting which need to be answered such as this post about taxonomy architectures, this post relating to the top 10 errors in SEC XBRL filings that I have run across, or this post where I evaluate the criteria for investor friendliness of SEC XBRL filings.




Want Minimal Complexity from XBRL? Use XBRL Without Linkbases
XBRL often gets criticized for being complex. Imagine that you had this simple set of data, a list of the U.S. states and their population. What is complex about this (if you cannot see these files in your browser, try your view source option to see the file contents):
- XBRL Instance: This looks a lot like what someone who might create traditional XML would come up with. (I know that there are many different ways this XML could be structured. That is just one commonly used way.)
- XBRL Taxonomy: This XML Schema looks a lot like what someone who might create a traditional XML Schema might come up with. You can even validate this using an XML validator.
Anyone who can write an Excel macro could generate this form of XML. Why might you use this approach to creating your XML? Well, you would be following a global standard rather than rolling your own form of XML. XBRL provides a framework to work within rather than inventing your own. There are other reasons one might consider this approach.
Eventually you will start asking some questions, or you should ask some questions. How would you verify the data to be sure that the populations of the states add up to the total US population (see that total in the PDF). Well, if you added a calculation linkbase XBRL could provide you with that functionality. For example, if the calculation linkbase were provided you would get a validation report which looks something like this. Sure, you could write your own proprietary validation routine...but that takes time and money. Also, because you used your proprietary validation you will not be able to share that with anyone else because it is proprietary.
Another thing you might want to add to your XML is labels for the element names so you don't have to deal with things that look like "WestVirginia" (XML element names cannot have spaces), but rather "West Virginia" which most users might prefer. No problem, you could write your own way to add labels to your XML and not use the label linkbase which XBRL offers. What about providing USPS codes for the states such as "WV", and abbreviations like "W. Vir.", and other such labels. And what about multilingual capabilities. Sure, you could add all this to your XML.
Do you see my point here? There are two. The first point is that you could use XBRL without the linkbases. That would make your implementation of XBRL significantly less complex, almost like tradional XML. You would be constrained slightly in terms of what you can do, you would have to follow the approach of creating XML that the XBRL framework requires.
The second point is that you will run into issues such as how to validate computations. That is why XBRL has XBRL calculations and XBRL Formula, not because people wanted to make XBRL more complex, but rather because the real world of business reporting has numbers and those numbers have relations. XML Schema validation cannot handle the validation of computations.
Basically, if you rolled your own XML, you would end up creating a lot of the functionality which XBRL already has. That is why the functionality is in XBRL, because the business people creating it needed the functionality. Business reporting is complex. XBRL has to serve the real needs of everyday business reporting.
You don't need to use all the features of XBRL. Start with only an XML Schema. Experiment. When you realize that you do need something like the calculation linkbase or label linkbase, you can add them later. When you realize that you do need XBRL's extensibility, it is there waiting for you to discover.
Besides, XBRL is a global standard. It is a framework. Sure, it takes a little more dicipline to use XBRL than just writing your own XML. That can be a good thing, or that can be a bad thing. You can decide based on your needs.




Many Different Forms of XBRL
This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
In another blog post I looked at different information exchange formats, among those formats was XML. In yet another blog post I looked at different forms of XML.
In this blog post I look at many different forms of XBRL. What I mean by different forms of XBRL is using different XBRL features to model the exact same meaning. What I did was use a simple data set, the same data set I used to look at different data exchange formats and the different forms of XML. That is, I used population data for the U.S. States. A rendering of this data set can be seen in this PDF.
(I became well exposed to issues relating to different forms of XML when working as part of the teams creating the US GAAP Taxonomy Architecture and the US GAAP Taxonomy. Rene van Egmond and I summarized what we saw in the following document, How a Simpler XBRL can Make a Better XBRL. This work resulted in something Rene, myself, and a few others created called XBRLS trying to overcome the issues we noticed.)
The data exists in a Microsoft Access database. I constructed the different taxonomies I would need to represent the data using an off-the-shelf taxonomy creation and validation tool. I generated the XBRL instances from within the Microsoft Access database. Basically, this shows the exact same meaning being modeled using different approaches; each automatically generated from exactly the same relational database information.
Here are the different approaches. For each approach an XBRL instance, the XBRL taxonomy, the validation of the computations, and a rendering of the presentation linkbase of the taxonomy is shown so you can get a quick sense of the XBRL taxonomy structure. You can load the instance and taxonomy within an XBRL tool to better look at them if you wish. (This discussion is more for business people with some technical understanding of XML and XBRL or at least willing to dig into this a little. So, you will need to look at some XML to truly grasp these issues. I have made some renderings of things available, but the good information is in the XML/XBRL files. Sorry, you cannot understand these things without getting your hands a little dirty.):
- Model using XBRL Tuples: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering) In this taxonomy you have only two concepts (the name of the state and the population). Those two concepts are wrapped inside a tuple. You can then use that tuple to express the states. The down side is that you can put in any state, meaning you cannot control which state name the user of the taxonomy adds. You could add an enumerated list to the name concept, but then you could not extend the list should a new state me needed. Note that there is no way to arrange tuples into a hierarchy, for example if you wanted to summarize the states by region. You cannot change the tuple, it is hard coded with only those two concepts. If you wanted to add say the state bird to the tuple, you would have to create a new tuple, the tuples are not extensible. That is why XBRL does not use the content models (that is what a tuple is, a content model).
- Model using XBRL Items: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering) In this approach, each state is its own concept. That concept contains the value for the population. The downside is that you have 50 concepts. You cannot use the wrong value like you could using a tuple. With items, they fact values can be arranged into a hierarchy by region if you desire. You can easily add new items. There is no content model to get in the way. If you wanted to add a state bird to the information set, you would have to add 50 additional concepts, one per state.
- Model using XBRL Dimensions: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering) In this approach, each state has its own concept, but the state is a measure (a value of the "State" dimension). There is an additional concept "Population". The state dimension is used to describe the fact value expressed in the population concept, which helps to distinguish the which state the population relates to. This approach is extensible. In fact, when you want to add another data point for say the state bird, you have to only add one new concept, not 50. You can also easily express the states in a hierarchy, say by region. (In this approach hypercubes are used to hook measures with facts using the multidimensional model.)
There are two very important points you should keep in the back of your mind when it comes to figuring out which approach to use. First, the approach you choose needs to serve your needs. Second, when extension is allowed to the XBRL used within your system you have to control which approach those extending the XBRL taxonomy can use because if you don't they could use any approach (i.e. different users can extend using different approaches, leading to inconsistencies).
Looking at these three approaches to modeling XBRL has nothing to do with which modeling approach is better: tuples, items, or XBRL Dimensions. The point here is that when multiple approaches exist you need to guide the user to what approach you want them to use if you desire consistency between the users within your system.
Each approach has its pros and cons; none is perfect. You will want to pick the modeling approach which best serves your needs. This is more about expressing your meaning consistently and accurately, the syntax matters little, if at all really.




Many Different Forms of XML
This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.
In another blog post I looked at different information exchange formats. In that post I mentioned that the world was standardizing on XML. But which form of XML? XML can come in many, many different forms.
I took a small data set which I had in a database and generated XML from that data set. The data set is simple enough: the population of each U.S. state. This PDF shows what the data set looks like in a rendered format.
Simple enough, here are some XML which I generated from the same Microsoft Access database information:
- Variation 1 of Traditional: This form of XML uses elements (rather than attributes).
- Variation 2 of Traditional: This form of XML uses one element 'State' and the values are attributes.
- Variation 3 of Traditional: This form of XML is much like the first variation, but using some different element names and a slightly different configuration.
- Variation 4 of Traditional: This form of XML is much like the first and third variations, but using different element names.
- Variation 5 of Traditional: This form of XML is much like the second variation using attributes, but different element names are used. Also note that the ID is the abbreviation of the state name.
- Microsoft Access Auto XML: This form of XML was auto-generated by Microsoft Access. This is an XML Schema for this form of XML.
- Microsoft Excel Auto XML: This form of XML was auto-generated by Microsoft Excel, it is the Excel XML format.
- RDF/OWL: This form of XML is RDF/OWL.
- XBRL: This form of XML is XBRL in a very simple form. This is an XML Schema for this XML. This is a validation report that shows that the population of the individual states adds up to the total population. (I will explain this in a bit.)
So, what is the point here? Well actually, I have several points which I will list and discuss.
- Every one of those forms of XML represent the exact same set of information, the information which you can see in that PDF. While the syntax of each of the files (the different XML forms above), the semantics of the information (the meaning of the information) is exactly the same.
- Some information is expressed more explicitly than others in each of the different forms of XML. For example, the population data is an estimate as of July 1, 2008. The point though is that fact (that the information is estimated and what point in time) is sometimes very explicit, other times somewhat implicit within the different forms of XML.
- The populations of each state are supposed to add up to the total for all the states. Here is another version of the first variation of XML with an error in it. Can you see the error, the last two digits of the total have been transposed. Different formats are better at communicating the fact that the information adds up than others. Meaning, you could in XBRL communicate that the information adds up quite easily, and get a report which shows that the information does add up. This is a validation report.
- The states are related to each other in different ways. For example, you can break down the states by say region: South, Northeast, West, Midwest, and so forth. That information is not communicated in any of these XML formats. However, any of these forms of XML could communicate that information, in XML, in whatever way they may desire.
Which form of XML is the best? Well, that all depends on what you need from the information all things considered. On the one extreme, if you just want to make a simple set of information available to a small group of people, any old XML will do. In fact, you could use pretty much any data format. But XML works well over the Web, it is in vogue, it is a good general format.
If you are, say, a government agency or other enterprise and you want to work with one data set and you don't need to exchange that information with other government agencies and you will only have one data format, traditional XML could work for you. But what if you want to verify that numeric information adds up correctly? Well, you could build your own validation mechanism because your data set is small and you don't have complex computations.
But how many government agencies or other enterprises don't have to interact with other government agencies or enterprises, subsidiaries, etc? If you interact with others, you have to agree. To agree, you need some sort of framework to agree on. For example, the National Information Exchange Model (NIEM) is a framework to help government agencies involved with public safety and security to create XML which is easier to share. The framework adds discipline to creating their XML formats. Rather than each agency creating point solutions to exchanging information; the framework provides the discipline needed to create a canonical standard format which makes exchanging information easier. (Their introduction document does a great job of explaining this.)
XBRL is also a framework for agreement. For example, the US GAAP Taxonomy Architecture is part of a framework for using XBRL in a specific way, creating what amounts to an application profile (i.e. no XBRL tuples, no XBRL typed dimensions, no use of the XBRL scenario context element, build [Table]s in a specific way, etc.) Also, the XBRL framework provides mechanisms for achieving things which are commonly needed in business reporting. For example, it provides the ability to: add labels, add multiple labels, express computations between numeric information, express additional types of relations between concepts, etc. If you need this and you are using XML, you would have to build these things yourself.
Sharing information to a large number of users is one thing. While a framework helps make these systems work better, what if you want to connect information between all these systems? Some people using traditional XML, some using XBRL, some using other formats. That is what RDF/OWL and the Semantic Web are all about. For example, this Data.gov project has converted numerous data sets into RDF/OWL. (This is a great book for understanding how the Semantic Web will be changing your life.)
The bottom line here as I see it is this: When you build your information exchange systems, be sure you are considering the right things for the long term. I see four groups of XML:
- XML unconstrained by a framework (ad hoc XML)
- XML constrained by some framework
- XBRL, one specific type of XML framework for a specific purpose (This blog posts helps you see how XBRL builds on top of traditional XML)
- RDF/OWL
This is not to say that one type of XML is better than another, it is more about understanding what you need to be considering when you try and determine your needs. Using the wrong type of XML is like trying to fit a square peg in a round hole. You can do it, but it pretty.



