The XBRL-based financial information in US SEC XBRL financial filings offers many clues which are helpful when figuring out how to architect your XBRL-based system. This blog post helps you see some of those clues. Playing around with the XBRL information in the SEC EDGAR system offers even more.
It is helpful to read two prior blog posts which provide background information. First this thought experiment (also in chapter 18, page 330 of XBRL for Dummies) helps you see issues related to financial oriented information. Second, the sample code on in the Dow Jones Industrial Average prototype can be helpful in creating pieces of your prototypes.
So let's walk through some of the steps which need to be worked through when working with a set of XBRL-based information.
- Locate the XBRL insances: The first thing you will need to do is locate the XBRL instances which contain the information. For the SEC XBRL information, the SEC provides an RSS feed. But that RSS feed has only the last 100 filings which the SEC has received. Another resource is provided by XBRL Cloud, the Edgar Dashboard. While that dashboard relates to validation information about XBRL filings, there is an XML file which contains information and helps you identify filings. The down side is that this feed is not official. In your system, you will want to be sure you have all the filings you want to work with. Note that neither the SEC RSS feed nor the Edgar Dashboard provided by XBRL CLoud are standards and not an XBRL format.
- Cross XBRL instance queries: Try a simple query as shown in the Dow Jones Industrial Average prototype and you will quickly realize a significant downside of working with XBRL instances directly; they take a long time to load. Try running the sample code and you will see that it takes about 5 minutes to iterate through only 30 files with no XBRL related processing taking place. Now, imagine trying to glean information from thousands of XBRL instances. S.....L.....O.....W. Perhaps not always, but in many cases you will want to load your XBRL-based information into some sort of database.
- XBRL format: In addition to the issues with all those XBRL instances mentioned in the issue above, XBRL is a great global standard, but not a great format to work with in doing queries. You will want to either put your information into some database as mentioned above or create some other format. This is the infoset format which I came up with.
- Grabbing sets of information: When working with information you will commonly want to grab a set of information at times. For example with financial information you might want a balance sheet, income statement, cash flow statement, accounting policies, or some specific disclosure. In SEC XBRL filings this can be problematic as there are no "handles" to grab. This issue is discussed in this blog post. Basically, because ever SEC filer creates their own networks and because [Table]s are polymorphic (can have more than one meaning), there are no high level "handles" to grab. This can be overcome by applying prototype theory and identifying components by their components. In building your system, keep this difference in mind.
- Dealing with metadata inconsistencies: The US GAAP taxonomy has some metadata issues which helps you see problems which can be created by inconsistencies in your metadata. Four of these inconsistencies are explained in this blog post. Testing and proper documentation can help you identify and eliminate these sorts of issues. Metadata inconsistencies can make it extremely difficult to query information. Algorithms can generally be written to correct many types of inconsistencies. But, the more you have and the higher the level of variability which exists, the more costly it will be to create these correction mechanisms. It is better, and less costly, to not have such inconsistencies in the first place.
- Adding additional helpful metadata: Adding additional metadata can be very helpful in working with XBRL-based information. For example, in SEC XBRL filings the SIC code is very helpful in allowing for classification of entities by industry. Yet the SIC code is not contained within the actual SEC filings in many cases. Your system will need to add these types of helpful metadata and maintain that metadata. The best thing to do in many cases is not not separate the sets of metadata; for example the SEC could simply have just required the SIC code in every SEC XBRL financial filing. Note that the SEC has that information somewhere, XBRL Cloud has this information in the Edgar Dashboard.
- Dealing with periods: The difference between a fiscal period and a calendar period points out an issue in working with periods. First off, XML Schema data types has no notion of the "quarter". Financial reporting makes heavy use of the quarter. Secondly, not ever company's year end is December 31. Many companies have other year ends (i.e. that is what a fiscal period is, something other than a calendar period). Some companies have what is called a 52/53 week fiscal period, particularly companies in the retail industry.
These are just a few of the realities which exist when working with a set of XBRL-based information. SEC financial filings are used as an example as you can experiment with them, learning what to do and what to avoid when you implement your XBRL-based system. Each system has its own unique challenges. Not realizing this can cause architectural problems which could have been avoided.