In a prior post I talked a little about XBRL and databases. In this post I want to go a little deeper and talk about querying the financial information reported using XBRL which is found in databases. There are several key things I want to point out.
I am fiddling with the business segment, legal entity, and geographic area breakdowns.
The first point I want to make it that you can read this information from the XBRL presentation relations, XBRL labels, and XBRL taxonomy schema(s) which contain this information. Reading this information is possible, but you have to pull all these pieces together. You could use an XBRL processor to do that.
Or, you could simply read an XML infoset such as this which has had all that information pulled together for you. If you are a bit of a programmer, using XPath axes are a great way to walk these infosets to work with specific sets of information.
I used XML infosets generated by XBRL Cloud to read information about the business segments, legal entities, and geographic areas of the 7160 public company 10-Ks I am analyzing. This Excel spreadsheet has that information.
Another point I want to make relates to the problem of inappropriate extenson concepts and how that hurts querying information.
Say I want to search for information reported about sales reported in the geographic area which represents the United States. The SEC provides a taxonomy with country information. That taxonomy "country" has the element with the name "US" which represents the country the United States. In an XBRL file, the value would look like "country:US".
If you go through the Excel spreadsheet above or you query SEC filings, you find "country:US" used 479 times. If you were to look at this in a database, it might look something like this if you queried for reported facts which relates to the US:
This is what that information looks like within SEC XBRL financial filings:
So sure, you do get the 479 SEC filers who properly used "country:US" as shown in the first graphic. In addition you get 184 other variations of how "United States" is represented. The namespace which preceeds the report element name, such as "iphs:" indicates that the filer created their own concept (as opposed to "country:" which mean that the report element came from the SEC taxonomy. Even if the different filers spelled the report element name exactly the same, say "UnitedStatesMember", it is a different concept.
What you see is the problem of SEC filers creating their own concept to represent something which the SEC already provides. This makes it extremely challenging to query SEC XBRL financial filings and return meaningful results. Can this little issue be worked around? Sure it can. But the real problem is that it is not just this one concept "country:US" where there is a problem. This problem exists with virtually every concept reported by SEC XBRL financial filers to some degree.
Some reported facts are worse than others. The most consistent reported facts are things like "dei:EntityRegistrantName" which is required and where inbound SEC validation tests to see if that required concept exists. 100% of filers comply, due to the validation/verification process used when a filing is submitted.
Some reported facts such as "us-gaap:Assets" are used by 98% or more of SEC filers. A success rate of 98% is pretty good. But, when you have, say, 500 different reported facts, a 2% error rate, and 8,000 SEC filers; that adds up to 80,000 errors. Inbound validation rules created by the SEC can minimize the number of errors.
What is worse, there is no real reward for properly using "country:US" and no penalty for a filer who creates their your own extension concept which makes using information harder. Perhaps RoboCop will resolve this unfairness.
So finally I will repeat my mantra about meaningful information exchange an point out that proper inbound validation provided by the SEC can significantly reduce errors: "The only way a meaningful exchange of information can occur is the prior existence of agreed upon semantics and syntax rules."
There is a direct correlation between good rules and meaningful queries.
If I can find errors within filings after they get into the SEC system; these errors can also be found by inbound verification/validation processes PRIOR to allowing a submission to go through. Finding and forcing SEC XBRL financial filers to fix these errors prior to the SEC accepting the filing will vastly improve quality. Not all errors can be found in this manner; however many more than are being found now could be found.