BLOG:  Digital Financial Reporting

This is a blog for information relating to digital financial reporting.  This blog is basically my "lab notebook" for experimenting and learning about XBRL-based digital financial reporting.  This is my brain storming platform.  This is where I think out loud (i.e. publicly) about digital financial reporting. This information is for innovators and early adopters who are ushering in a new era of accounting, reporting, auditing, and analysis in a digital environment.

Much of the information contained in this blog is synthasized, summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the chapters of Intelligent XBRL-based Digital Financial Reporting. If you have any questions, feel free to contact me.

Entries from April 18, 2010 - April 24, 2010

Exchanging Business Information: XML, XBRL or RDF/OWL, Which is 'Best'?

This blog post summarizes several other blog posts.  It may seem rather stream of consciousness and be along the lines of brainstorming, if it does that is because that is what this blog post is.  I am summarizing this information to help myself understand it and learn to better communicate it to other business users. It is hard to say how many years of thinking have gone into this.  But I have to answer this question over and over and I wanted to understand the real answer for myself. The questions are which is 'better': 

  • Is XBRL is 'better' than XML, 
  • Is RDF/OWL is 'better' than XBRL
  • What considerations go into deciding which syntax (XML, XBRL, and RDF/OWL are all syntaxes) is 'best'

The first thing one needs to do is define the problem you are trying to solve.  In general, the problem which seems to need solving is getting business information out of one system, be it internal to your organization or external to your organization, and then automating the process of using that business information within another business system.

Helpful background information

Here is some information which provides helpful background in understanding the moving pieces of this issue. This may seem like a lot of stuff to know, and it is.  But if you want to understand the moving pieces and make the right choice, you do need to understand the moving pieces or have someone help you who does.  This is not about providing you with a two minute sound bite, this is about providing you with the information you need to truly understand the issues you need to consider.

  • Structured information, not unstructured information: Two points here.   First, I am talking about structured information.  Second, the world is moving toward structured information because computers cannot parse unstructured information reliably enough and it costs too much.  This video, How XBRL Works, helps you understand the difference between the two.
  • Structured for meaning, not structured for presentation: I am talking about information structured for meaning, not information structured for presentation.  Again, the How XBRL Works video helps you understand the difference.
  • Global standard, not point solutions: If information is structured you can always convert it into some other structure using some mapping process.  If everyone used their own structure, everyone would have to map to everyone else's structure.
  • Use XML:  There are many different data formats.  The world is standardizing on XML.
  • Many different forms of XML: There are many different forms of XML.
  • Many different forms of XBRL: There are many different forms of XBRL.
  • Many different forms of RDF: There are many different forms of RDF.

XBRL builds upon XML

In a previous blog post I explained how XBRL builds on top of XML.  Let me summarize these points here, you can go to that blog post to drill into this information further.  This is also explained in my book XBRL for Dummies (page 33)

  • XBRL is XML
  • XBRL expresses semantics (meaning) in a standard format
  • XBRL allows content validation against the expressed meaning
  • XBRL separates concept definitions from the content model
  • XBRL can express multiple hierarchies of explicit relations
  • XBRL provides prescriptive extensibility
  • XBRL easily fits into relational databases
  • XBRL provides multidimensional models
  • XBRL enables "intelligent", metadata driven connections to information

XBRL's "Sweet Spot"

XBRL has a 'sweet spot'.  This sweet spot is discussed in my book XBRL for Dummies (page 172) in detail, I summarize the points for you here. 

  • Flexibility within rigid systems
  • Reconfigurable information
  • Rules engine-based validation
  • Clear communication and sharing of rich business-level semantics
  • Metadata-driven configuration, no IT involvement required
  • Zero tolerance for errors
  • Achieving agreement with exterior parties

XBRL, RDF/OWL, and the Semantic Web

When people talk about the Semantic Web, terms such as RDF and OWL come up as the information formats of the Semantic Web. If RDF/OWL are the formats of the Semantic Web, then it seems obvious that all information should be expressed in RDF/OWL.  Right?  Do away with all other information formats, move everything to RDF/OWL and life will be good.  That is the only way where you can write "queries" on the information on the Web, if the information is in the same format.

RDF/OWL has benefits beyond what XBRL can provide.  These benefits seem to be:

  • OWL has way more power to express semantic meaning than XBRL.  That is what OWL is for, expressing semantic meaning within an ontology.  XBRL is more in the "taxonomy" expression business than the "ontology" expression business. (To understand the difference between a dictionary, a classification system, a taxonomy, and an ontology, see this blog post.)
  • RDF/OWL are the W3C information formats for the Semantic Web.
  • RDF/OWL can express anything, XBRL is more focused on business information. 

Bottom line: XML, XBRL, RDF/OWL; Which is 'Best'?

From what I can tell, the answer to the question of whether to use XML, XBRL, or RDF/OWL is that it depends on what you are using it for.  What is crystal clear is that XML, XBRL, and RDF/OWL are syntaxes.  What is important to business users is semantics, not syntax.  What ever syntax you choose, you should be able to convert it to any other syntax, be that an external exchange format or an internal storage format such as your relational database.  The semantics (meaning) must be the same in any business system or the information exchange simply will not work.

There are lots of obvious places where clearly XML is the way to go. It seems XML is perfect for specifying large, fixed documents such as DocBook, expressing Excel spreadsheets, XHTML, and such.  XML is also perfect for fixed transactions which rarely change.  What seems to be key here is "fixed".

For "ad hoc" projects, tightly controlled systems which are closed, XML will probably work fine.  When you start talking about enterprise class systems, lots of users, the need to scale, things which need to be rock solid, you need to have some sort of framework.  XML frameworks can be created. NIEM is such a framework (National Information Exchange Model).  The NIEM Introduction provides a very good explanation of why frameworks are important.  Basically, frameworks provide discipline and leverage.

XBRL is a framework.  It provides discipline and leverage.  A primary benefit of XBRL is XBRL Formula, the ability to model business rules in a global standard format.  Being able to express those business rules means that you can validate the semantics (not just the syntax) of information in a global standard  way and exchange those business rules with others. XML cannot do this, it probably never will be able to.  RDF/OWL cannot do this now, but the W3C seems to be working on this.

RDF/OWL offers a powerful tool to express complex semantics in a global standard way, far beyond the capabilities of  XBRL.  RDF/OWL will be the least common denominator of the Web, the way to get different syntaxes to be able to work together.

It seems as though the answer to the question about which is better is that it depends on the system you are implementing really.  What is clear is that clear semantics are critical.  RDF/OWL can help in this regard.  If you cannot clearly express your information model in RDF/OWL, then your information model is broken.  If you can express your model in RDF/OWL, the least common denominator of the Semantic Web and a very powerful tool for expressing semantics, then it will not matter what syntax you use because you will be able to convert to any syntax and the RDF/OWL will document exactly how to do that.

A lot of these details are discussed in my book XBRL for Dummies. The book lays many of these things out so business readers can get their heads around them and understand the right questions to be asking the technical people who have to help them use XML, XBRL and RDF/OWL within their business systems.  The area of RDF/OWL is rather weak in the book, but the key concepts are there.  Watch my blog for more information should you need such information.

Many Different Forms of RDF

This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.

I have posted a number of blog entries relating to RDF, OWL, and the Semantic Web which you can find here. I want to summarize what I have figured out with regard to RDF here.

RDF (Resource Description Framework) is one of the cornerstones of the Semantic Web. RDF can be used to document pretty much anything.  The core to RDF seems to be the subject-predicate-object relation which was it seems used by Aristotle. This is what RDF looks like in one form, XML.  I am not going to explain RDF in any more detail, go look at the other blog posts for that.

What I do want to document is the forms of RDF:

  • Triples: There are lots of terms for subject-predicate-object relations. Here are some of those notations (syntax): N3, N-Triples, TRiG, TRiX, Turtle, RDF/XML, RDFa.  Each of these probably has their pros and cons.  The point here is that this is for the most part many different ways of doing the same thing.
  • RDF XML: So because I want to stick with XML, I will focus on RDF XML as "the" format for RDF for my purposes.  The format really does not matter, what matters is what I talk about below.  Using RDF alone is like using XML without a schema.  You can basically include anything, right or wrong.
  • RDFa: RDFa is an approach to embedding metadata into HTML web pages. Something similar to this is eRDF. RDFa and eRDF are similar to iXBRL.
  • RDF plus OWL: Web Ontology Language (called OWL) can be thought of as a schema for RDF, loosely similar to how XML Schema constrains XML.  But, OWL is much different in that it is used to constrain semantics, not syntax.  What this means is that RDF by itself seems somewhat useless really.  You have to both make sure you build your RDF relations correctly and you understand those relations.  That is what OWL seems to do. OWL defines a semantic model which both explains the RDF and constrains the RDF.
  • RDF and standard OWL ontologies: The next step in the spectrum is what I am calling standard OWL ontologies. It is one thing for someone to post an ontology to the web.  You could have hundreds or thousands of ontologies which express the same thing.  The ontologies could have different logical models and not even interoperate.  As compared to having one agreed to ontology for some specific model.

So, what the heck does all this mean.  Let me try and explain.  I will use a small data set which I have created to explain.  Browse through these different data sets which I found on the web.  I grabbed these data sets, I decided to grab 20 different sets of data. Imagine you had the following data sets:

  1. When states entered the union.
  2. State violent crime statistics.
  3. Miscellaneous population statistics by state.
  4. State capitals and largest cities.
  5. Population estimates by state. (This is the specific CSV file which will open in Excel.)
  6. Financial information by state. (This is the specific Excel file.)
  7. State areas.
  8. State symbols.
  9. State mottoes.
  10. State nicknames.
  11. Origin of state names.
  12. State GDP.
  13. State GDP per capita.
  14. State population density.
  15. State tax revenues.
  16. State unemployment rates.
  17. Gross state product per capita.
  18. State by most educated.
  19. State by health index.
  20. State by personal income.
  21. (Extra) Red, Blue, and Purple states

Suppose you wanted to use the data in one of those data sets, what would you do? Copy and paste into Excel most likely.  What if you wanted to use two of those data sets together. No problem, just copy and paste both sets into Excel and put them together.  When you try and do something like this you run into problems such as the key value (i.e. in this case the state name probably) could be different.  For example, this list uses the state abbreviation, not the state name.  Now, this is not a huge deal if you don't need this information on a timely basis, or if you have small sets of data like the 50 states, etc.

So what if this information was in XML like this data set of state population. It would be pretty easy to write a simple Excel macro to go get the data. But what if each set of data used a different XML syntax?  See this blog post on different XML formats. OK, so not a huge problem, just write multiple import Excel macros, one for each XML file.  Right?  Well, that will get old.

OK, so what if everyone used the SAME XML format?  Say, RDF.  Well, then you could read the RDF by just pointing an application at the file, right?  Not quite.  What if the RDF used different logical models (or ontologies) to describe the data? If that happens, well, then you are back to mapping one file at a time, adjusting the multiple logical models or ontologies into one common model. You can do this, but it is a lot of work.

But what if there were another way?  What if you created one standard logical model, documented in using OWL, and then made every piece of data available in a common format.  Check out this Data-gov Wiki. Look at this web site, or wiki.  More specially, look at this complete data set of RDF.  Per the web site, they have converted about 280 data sets into RDF.

OK, so what is the bottom line here with regard to RDF.  First, the Semantic Web is about making information on the web more readable to computers.  To do that, the best way is to have one data format (semantics and syntax).  Short of that, one can take the many different data formats and map them to one syntax. You have to be sure the semantics (the meaning) of the data is consistent.  Much of the data needs to work together. Most may never be used together, but come like the state information I pointed out, will be used together. XML is a syntax that pretty much most people on the web are moving to, so RDF in XML makes sense.  You need OWL to articulate your ontology, or your model, so people both understand your model and data made available complies with that model.

But my next question is when should XBRL be used and when should RDF/OWL be used?

A Paper: Quality of XBRL US GAAP Taxonomy: Empirical Evaluation using SEC Filings

A paper written by Hongwe Zhu and Harris Wu of Old Dominion University, Quality of XBRL US GAAP Taxonomy: Empirical Evaluation using SEC Filings, looks at the quality of the US GAAP Taxonomy.  The following is an abstract of that paper:

The primary purpose of a data standard is to improve the comparability of data created by multiple standard users. Given the high cost of developing and implementing data standards, it is desirable to be able to assess the quality of data standards. We develop metrics for measuring completeness and relevancy of a data standard. These metrics are evaluated empirically using the US GAAP taxonomy in XBRL and SEC filings produced using the taxonomy by approximately 500 companies. The results show that the metrics are useful and effective. Our analysis also reveals quality issues of the GAAP taxonomy and provides useful feedback to the taxonomy users. The SEC has mandated that all publicly listed companies must submit their filings using XBRL beginning mid 2009 to late 2014 according to a phased-in schedule. Thus our findings are timely and have practical implications that will ultimately help improve the quality of financial data.

While the paper just scratches the surface (note that the authors refer to this as their initial work), it does offer helpful insight into using the US GAAP Taxonomy to create SEC XBRL Filings and using those filings.

Here is a summary of some of the more helpful and interesting findings of this analysis.  I have added my commentary to this information using italics:

  1. The authors point out that from a syntactical perspecive, all the XBRL documents (XBRL instances and the filer XBRL taxonomy extensions) prepared by SEC XBRL filers are interoperable because they use the same syntax (i.e. XBRL). However from a semantic perspective (business meaning), the XBRL documents can be difficult to compare when different companies use different data elements in their document.  (I would point out that an XBRL taxonomy contains two things: elements (better referred to as concepts really) and relations.  Differences in not only concepts can cause comparability difficulties, but also differences in the relations.)
  2. The authors point out that from the perspective of the users of this information, the metadata (i.e. the taxonomy concepts and like I said above their relations to other concepts) are also data.
  3. The US GAAP Taxonomy has 10,537 "active concepts".  (There are also 2,653 abstract concepts which can never be used to report information and 346 deprecated concepts which should not be used by SEC filers.)
  4. Many companies are reporting using the deprecated concepts.  The authors point out that 195 out of 481 filers used deprecated concepts.  They point out a list of the deprecated concepts.  (It seems to me that the SEC should provide a submission test to not allow these deprecated concepts to be used.  The US GAAP Taxonomy clearly identifies these concepts.  Or, perhaps the US GAAP Taxonomy should simply remove these concepts rather than providing them and marking them deprecated.  Either of these would solve this problem.)
  5. All SEC XBRL filings combined, a total of 2,558 US GAAP Taxonomy concepts were used and a total of 10,168 custom concepts were introduced by filers.
  6. The average filing contained approximately 125 concepts of which 109 were from the US GAAP Taxonomy and 16 were added by filers.
  7. A list of the top 50 most used US GAAP Taxonomy concepts is listed.  (I have two points relating to that list.  First, I am very currious why "Assets" is used 1229 times and "LiabilitiesAndStockholdersEquity" was used 1217 times.  Seems to me they should be the same.  Maybe this is because some of the filers are partnerships or something.  Second, it seems to me that the frequency of the items on that list is a good indicator of comparability between filings.)
  8. A list of the top 50 custom concepts added by SEC XBRL filers is provided.  Of that list of 50, the authors point out that 15 concepts have identical names as concepts which exist in the US GAAP Taxonomy. Of that list of 50, an additional 13 concepts added are very similar to US GAAP Taxonomy concepts.  (I would point out that these lists are a nice little validation check which could help SEC XBRL filers not use duplicate concepts.  These checks would be quite easy for a software application to implement and seems helpful to filers.)
  9. A comment was made by the authors, "The data shows that the lengthier-named elements are certainly less frequently used." (I would point out from my experience in creating the US GAAP Taxonomy that the lengthier concepts are in the disclosures and the SEC XBRL filers are not doing detailed tagging of the disclosures, therefore the lengthy concepts clearly would not be used at this point.  The author also notes that only 2,653 of the existing 10,537 concepts were used.  Wait until detailed tagging kicks in, the number of concepts used would grow.)

All in all this research provides useful information, but like the authors point out...there is a lot of opportunity for more research.  Just like the SEC XBRL filers, the researchers will figure out the really interesting things to research when they have more experience with the taxonomy.

Also, I would point out that the researchers who did this analysis have an IT background, not an accounting or financial reporting background.  So while this information is helpful, personally I think more accounting people need to be involved in this type of research.  I have several blog posts which raise interesting accounting and financial reporting which need to be answered such as this post about taxonomy architectures, this post relating to the top 10 errors in SEC XBRL filings that I have run across, or this post where I evaluate the criteria for investor friendliness of SEC XBRL filings.

 

Want Minimal Complexity from XBRL? Use XBRL Without Linkbases

XBRL often gets criticized for being complex.  Imagine that you had this simple set of data, a list of the U.S. states and their population.  What is complex about this (if you cannot see these files in your browser, try your view source option to see the file contents):

  • XBRL Instance: This looks a lot like what someone who might create traditional XML would come up with.  (I know that there are many different ways this XML could be structured. That is just one commonly used way.)
  • XBRL Taxonomy: This XML Schema looks a lot like what someone who might create a traditional XML Schema might come up with.  You can even validate this using an XML validator.

Anyone who can write an Excel macro could generate this form of XML. Why might you use this approach to creating your XML?  Well, you would be following a global standard rather than rolling your own form of XML.  XBRL provides a framework to work within rather than inventing your own.  There are other reasons one might consider this approach.

Eventually you will start asking some questions, or you should ask some questions.  How would you verify the data to be sure that the populations of the states add up to the total US population (see that total in the PDF).  Well, if you added a calculation linkbase XBRL could provide you with that functionality.  For example, if the calculation linkbase were provided you would get a validation report which looks something like this.  Sure, you could write your own proprietary validation routine...but that takes time and money.  Also, because you used your proprietary validation you will not be able to share that with anyone else because it is proprietary.

Another thing you might want to add to your XML is labels for the element names so you don't have to deal with things that look like "WestVirginia" (XML element names cannot have spaces), but rather "West Virginia" which most users might prefer.  No problem, you could write your own way to add labels to your XML and not use the label linkbase which XBRL offers.  What about providing USPS codes for the states such as "WV", and abbreviations like "W. Vir.", and other such labels.  And what about multilingual capabilities.  Sure, you could add all this to your XML.

Do you see my point here?  There are two.  The first point is that you could use XBRL without the linkbases.  That would make your implementation of XBRL significantly less complex, almost like tradional XML.  You would be constrained slightly in terms of what you can do, you would have to follow the approach of creating XML that the XBRL framework requires.

The second point is that you will run into issues such as how to validate computations.  That is why XBRL has XBRL calculations and XBRL Formula, not because people wanted to make XBRL more complex, but rather because the real world of business reporting has numbers and those numbers have relations.  XML Schema validation cannot handle the validation of computations.

Basically, if you rolled your own XML, you would end up creating a lot of the functionality which XBRL already has.  That is why the functionality is in XBRL, because the business people creating it needed the functionality.  Business reporting is complex. XBRL has to serve the real needs of everyday business reporting.

You don't need to use all the features of XBRL.  Start with only an XML Schema.  Experiment.  When you realize that you do need something like the calculation linkbase or label linkbase, you can add them later.  When you realize that you do need XBRL's extensibility, it is there waiting for you to discover.

Besides, XBRL is a global standard.  It is a framework.  Sure, it takes a little more dicipline to use XBRL than just writing your own XML.  That can be a good thing, or that can be a bad thing.  You can decide based on your needs.

 

Many Different Forms of XBRL

This is a series of posts where I am providing information relating to figuring out what the best data format to use and why. Basically, when is XML better, when is XBRL better, and when is RDF/OWL better.

In another blog post I looked at different information exchange formats, among those formats was XML.  In yet another blog post I looked at different forms of XML. 

In this blog post I look at many different forms of XBRL.  What I mean by different forms of XBRL is using different XBRL features to model the exact same meaning.  What I did was use a simple data set, the same data set I used to look at different data exchange formats and the different forms of XML.  That is, I used population data for the U.S. States.  A rendering of this data set can be seen in this PDF

(I became well exposed to issues relating to different forms of XML when working as part of the teams creating the US GAAP Taxonomy Architecture and the US GAAP Taxonomy.  Rene van Egmond and I summarized what we saw in the following document, How a Simpler XBRL can Make a Better XBRL. This work resulted in something Rene, myself, and a few others created called XBRLS trying to overcome the issues we noticed.)

The data exists in a Microsoft Access database.  I constructed the different taxonomies I would need to represent the data using an off-the-shelf taxonomy creation and validation tool.  I generated the XBRL instances from within the Microsoft Access database.  Basically, this shows the exact same meaning being modeled using different approaches; each automatically generated from exactly the same relational database information.

Here are the different approaches. For each approach an XBRL instance, the XBRL taxonomy, the validation of the computations, and a rendering of the presentation linkbase of the taxonomy is shown so you can get a quick sense of the XBRL taxonomy structure.  You can load the instance and taxonomy within an XBRL tool to better look at them if you wish.  (This discussion is more for business people with some technical understanding of XML and XBRL or at least willing to dig into this a little.  So, you will need to look at some XML to truly grasp these issues.  I have made some renderings of things available, but the good information is in the XML/XBRL files. Sorry, you cannot understand these things without getting your hands a little dirty.):

  • Model using XBRL Tuples: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering) In this taxonomy you have only two concepts (the name of the state and the population).  Those two concepts are wrapped inside a tuple.  You can then use that tuple to express the states.  The down side is that you can put in any state, meaning you cannot control which state name the user of the taxonomy adds.  You could add an enumerated list to the name concept, but then you could not extend the list should a new state me needed.  Note that there is no way to arrange tuples into a hierarchy, for example if you wanted to summarize the states by region.  You cannot change the tuple, it is hard coded with only those two concepts.  If you wanted to add say the state bird to the tuple, you would have to create a new tuple, the tuples are not extensible.  That is why XBRL does not use the content models (that is what a tuple is, a content model).
  • Model using XBRL Items: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering) In this approach, each state is its own concept.  That concept contains the value for the population.  The downside is that you have 50 concepts.  You cannot use the wrong value like you could using a tuple. With items, they fact values can be arranged into a hierarchy by region if you desire.  You can easily add new items. There is no content model to get in the way.  If you wanted to add a state bird to the information set, you would have to add 50 additional concepts, one per state.
  • Model using XBRL Dimensions: (XBRL instance, XBRL taxonomy, Computations validation, Taxonomy rendering)  In this approach, each state has its own concept, but the state is a measure (a value of the "State" dimension).  There is an additional concept "Population".  The state dimension is used to describe the fact value expressed in the population concept, which helps to distinguish the which state the population relates to.  This approach is extensible.  In fact, when you want to add another data point for say the state bird, you have to only add one new concept, not 50.  You can also easily express the states in a hierarchy, say by region. (In this approach hypercubes are used to hook measures with facts using the multidimensional model.)

There are two very important points you should keep in the back of your mind when it comes to figuring out which approach to use.  First, the approach you choose needs to serve your needs.  Second, when extension is allowed to the XBRL used within your system you have to control which approach those extending the XBRL taxonomy can use because if you don't they could use any approach (i.e. different users can extend using different approaches, leading to inconsistencies).

Looking at these three approaches to modeling XBRL has nothing to do with which modeling approach is better: tuples, items, or XBRL Dimensions.  The point here is that when multiple approaches exist you need to guide the user to what approach you want them to use if you desire consistency between the users within your system. 

Each approach has its pros and cons; none is perfect.  You will want to pick the modeling approach which best serves your needs.  This is more about expressing your meaning consistently and accurately, the syntax matters little, if at all really.

Page | 1 | 2 | Next 5 Entries