My first attempt to test a fundamental set of accounting concepts and relations showed an accuracy rate of 95%. The second attempt tuned the algorithms and metadata and increased that accuracy rate to 96.6%. I have further tuned the algorithms and metadata and have achieved an accuracy rate of 97.3%. You can see the detailed computation here.
Now, I don't know if I am computing this correctly because there are two tests, IS1 and IS2, which do not apply to all filers. However, I am leaving these tests in the computation because they do point out two problems: finding revenues reliably and differentiating nonoperaing income (loss) and interest and debt expense.
Want to try an Excel-based version of the tool used to grab this information from SEC XBRL Financial filings? Well, here you go.
- Download working prototype: This is an Excel-based application written in VBA which grabs information from my set of 7160 SEC XBRL financial filings (10-Ks).
- VBA Code: This is the VBA code which grabs the information from the flings. (Please don't use this to judge my programming skills, rather use it to reverse engineer the process.
- Sample output: This is sample output of the code which generates results directly to the debug window rather than populating the Excel spreadsheet. (Point is, there are two algorithms)
- Python code: Someone is converting my code into Python. You can get to that code here.
- Raw data: Here is the raw data in Excel which was created using the same algorithm applied to each of the 7160 SEC 10-K filings.
- Prototype Microsoft Access Database Application with Data: This is a prototype database application created using Microsoft Access which provides an interface and metadata for working with the fundamental accounting information for the set of 7160 SEC filers.
Here is a screen shot of the Excel prototype populated with the data for Hewlett Packard:
Here is a dump of other information which I learned from this process:
- Of the set of 7160 SEC 10-K filings, 1071 (15%) pass all checks. Here is the list. The list is also provided in the Excel prototype.
- If I leave out checks IS1 and IS2, then 3325 (47%) of all SEC filers pass all of these checks.
- There are two areas which stick out has challenging to grab information: revenues and differentiating nonoperating income (loss) and interest and debt expense.
- Services which provide information which rely on only the XBRL concept will not generally allow those who want to use the information create useful comparisons. The reason is that there are many different concepts which are used to express the same fundamental accounting concept. Look through the algorithm which grabs the data. Often, there are alternatives which filers are using to express key information, for example net income (loss) could be any of 7 different concepts.
- At this level of information, the vast majority of SEC filings are comparable. Some are not. For example, a filer providing a statement of net assets rather than a balance sheet does not totally fit into this representation. Situations like that are just additional edge cases which need to be provided for.
This is a boatload of information! If you have good reverse-engineering skills, all these samples provide ideas on how accountants can make use of this financial information of public company SEC XBRL filers.
I am trying to get commercial software vendors interested in creating tools such as this for accountants and other business uers.