The following is a post I made to an XBRL news list which walks you through, step-by-step, extracting financial facts from SEC XBRL financial filings from public companies. The explaination provides a "happy path" which eliminates details to make the explaination easier. Extracting any fact works in this same manner, it just involves additional details. Use this prototype to work through this process if you desire. You can turn the debugger on and step through the process as it unfolds. Basically, extracting other information is a variation of this same process.
STEP 1: You need to know WHERE the information is (i.e. what companies, what filings).
The first thing you need to do is have a list of companies or know the list of companies you want to pull information for and which filings contain that information. So XBRL does not deal with this issue at all and is, in my view, a flaw in XBRL because you cannot express this and therefore every software tool consuming information works differently.
In the SEC's case, they provide an RSS feed thich contains lists of reoprting entities and the submissions of those entities. You can find that RSS feed here:
If you go into one of those RSS feeds (look at the XML, THAT is what a computer application will read; you can VIEW this via a browser).
And so, if you notice my Excel application; it eliminates that step by providing a workbook which contains all this information.
STEP 2: What information do you want?
The next thing is that you need to know WHAT information you are looking for. Because the US GAAP Taxonomy uses multiple XBRL elements to express the exact same financial reporting concept, you are forced to "map" the XBRL concepts to what you are looking for.
For example, the US GAAP Taxonomy uses the elements "us-gaap:LiabilitiesAndStockholdersEquity" and "us-gaap:LiabilitiesAndPartnersCapital" to represent the one financial reporting concept "LiabilitiesAndEquity" which I am using within my system. This is a pretty basic example, "Revenues" is a much more complex example as you pointed out. But, the issue is EXACTLY the same: you need to understand WHAT to look for in the SEC XBRL financial filing.
I did this for 51 concepts that I am looking for, see the VBA code. I did this the WRONG way in my code, basically hard coding it. I have subsequently moved the "tree" of IF statements into a database.
STEP 3: Which period, which piece of the entity, which reporting scenario?
The next thing you need to do is for each concept that you want, you have to get the correct fact within the correct context. For example, you have "current period" and "prior period" and "year to date" and "quarter to date". You also have "consolidated entity" and different subsidiaries or business segements or geographic areas and so forth.
The SEC did a VERY, VERY nice thing. They made it EASY to identify the "current period" and the root reporting entity. So you can leverage the way the SEC says the information needs to be structured and the reported concept "dei:DocumentPeriodEndDate" (which is basically the balance sheet date of the current period). From that you can find the YEAR TO DATE period of the income statement. And because of the way the "root entity" or consolidated entity is consistently articulated because of the SEC rules, yuo can leverage that consistency to get the "consolidated" information.
What may be confusing you here is that I am "taking the happy path" through the information. I ONLY want (a) the CURRENT period, (b) the CONSOLIDATED entity, (c) of the ROOT entity. (Clearly if you want EVERY period and EVERY entity breakdown and EVERY scenario this is more complicated, but the same general ideas apply.)
STEP 4: Impute missing information
Because filers inconsistently report facts, I impute or "plug" missing information based on existing information and known relationships. For example, not all filers report "us-gaap:Assets". If a filer, say, has only one asset, "us-gaap:Cash", many leave out the concept "us-gaap:AssetsCurrent", "us-gaap:AssetsNoncurrent" and "us-gaap:Assets". However, I know that for 100% of all companies who report a classified balance sheet that "Assets = CurrentAssets + NoncurrentAssets". That is universally true, that relation NEVER changes and the fact that 98+ percent of filers FOLLOW that rule PROVES that the rule is true. The other 2% have confirmed errors in their filing or do strange things which are beyond the capabilities of my algorithm to handle.
STEP 5: TEST, TEST, TEST
To be SURE that I am getting the correct information I run the information through a battery of 21 tests which are universally true about balance sheets, income statements, and cash flow statements. Again, these tests are played out in the 7160 SEC filings that I am looking at (10-K only, not the other forms). There is SOME variability that I have dealt with such as a classified versus an unclassified balance sheet and a single-step or multi-step income statement.
STEP 6: Populate Excel
That is it, put the result in Excel for each of the concepts that you want.
Again, like I have said, this is "the happy path" to a small set of information. Getting at any OTHER information is some variation on the steps above: what entity, what filing, what fact, what part of the entity, what period, what scenario. There are sometimes other characteristics; but these ideas work for EVERYTHING. I can PROVE that, it is a fact.
YES, the more you deal with the other variations, the more complicated the computer code necessary. Things can be made easier by improving the US GAAP Taxonomy and eliminating duplicate instances of the same information, better organizing the taxonomy to enable the ability to more easily map the concepts used to the information you want. The system can also be improved by better inbound validation by the SEC to eliminate errors in the information. The system can be improved by minimizing the need to imply things because a filer did not explicitly make it available. But, these issues do exist now so they need to be dealt with.
All of the above is reflected in the VBA code shown in this prototype and in the working Excel application which you can run, turn your debugger on if you want, and WATCH the entire process unfold in front of your own eyes:
Does this get you to where you want to be? If not, tell me what I am missing and I will attempt to fill in those gaps.
Much of this stuff can be simplified by using a web service or someone who sorts this out for you. Data aggregators do this including the mappings, that is HOW they get that information. There is no magic, only hard work. That is the "high end". Other web serivces such as the XBRL Cloud web service don't provide the "final" interface, but they do all the "stuff" necessary to get you the information you want, you build your own interface.
The GOOD news is that you can now get the financial information of thousands of companies in minutes. Think of how long this process took three years ago when this task was either (a) 100% manual or (b) the parsing necessary was so complicated and cost prohibitive that only big companies could afford to do it. Maybe this is not perfect, but it works and is useful. In time, things will get even easier!