Most people that look at XBRL-based financial reports and then want to store XBRL-based financial information from those reports in some sort of database make the fundamental mistake of "storing XBRL". This is, as contrast to, storing financial facts or information that is helpful in using those financial facts in some sort of "fact database".
This blog post provides information that helps you not repeat this sort of mistake.
First, have a look at the UML model of a business report created by SBRM (keeping in mind that a financial report is a type of business report). Note that if you want to work with XBRL-based financial reports, you are going to want to understand the OMG Standard Business Report Model (SBRM).
Second, while you can store properly modeled financial information within a relational database very effectively; relational databases might not always be the best storage alternative. For example, TigerGraph and ArangoDB are a graph database that seems to be a rather interesting information storage alternative. That may, or may not, be an appropriate data storage scheme.
Years ago, I helped a company named 28msec put all of the XBRL-based financial reports submitted to the SEC into a NoSQL database. They really studied XBRL and frankly, I learned a tremendous amount from this experience. After this project was done, Ghislain Fourny who is a data scientist and database expert wrote this paper, Cell Stores, and the book, The XBRL Book: Simple, Technical, Precise. If you want to put XBRL-based financial information into a database; I highly recommend that you read both that paper and the book. That offers clues about the best information storage formats.
In the Cell Stores paper, Ghislain makes an analogy of financial information to the different states of matter:
Cells are the primary citizens and exist in different forms, which can be explained with an analogy to the state of matter: as a gas for efficient storage, as a solid for efficient retrieval, and as a liquid for efficient interaction with the business users.
Now, I personally do not understand how to store information the most effectively in a database. But I can tell you the general use cases that business professionals have. You can distill those use cases down into three query patterns:
That is really about it. The "magic" is to filter all that XBRL-based information to get exactly what you are looking for. To do that, the information needs to be stored approprately to allow for effective information retrieval. And so, you want to store the information such that it can easily be reconfigured into the three types of queries above at a minimum.
There are a lot of details involved in that process. Key is storing meaning, not syntax. What is critically important to understand is that financial reports ARE NOT FORMS!!!. Does that make the work involved harder? Absolutely it does.
But, the task for all three common query use cases can be distilled into three specific sets of information (XML infoset examples provided for each information set) that "convert" the "gas" into a "solid" per the analogy above:
All of this is explained in detail in the document Fact Sets. A financial report is not one big thing, it is a system of lots of little things. That system is logical. That system has logical patterns. A pattern is any form of correlation between the states of elements within a system. Software engineers can leverage patterns.
XBRL Cloud does a lot of this in their Edgar Report Information database. Watch this video to get a general idea of the information you can extract for any piece of any XBRL-based financial report. What XBRL Cloud has is not perfect, but it is pretty close and is available via an easy to use REST API.
Here is specific information for the 2017 Microsoft 10-K. In the ZIP file are Excel spreadsheets showing the fact sets discovered by XBRL Cloud which does a pretty good job and Pesseract which does an excellent job. Also included are the XML Infosets that were generated by each application. You can look at a human readable rendering of the 2017 10-K of Microsoft here.
When represented correctly, XBRL presentation relations, XBRL calculation relations, XBRL definition relations, and XBRL formula become information that a query can use to extract facts.
For example, the Microsoft 2017 10-K contains 128 networks, 128 hypercubes, and 194 different "blocks" of information. A "block" and a "fact set" are synonyms. Facts are fine to work with, when you are after individual facts such as "Assets" or "Net Income (Loss)". But if you want to work with a specific disclosure, you need some specific set of facts. XBRL presentation, calculation, and definition relations are helpful in extracting that information and XBRL formulas and other rules are helpful in making sure you put the pieces together correctly. (here is similar information for Microsoft's 2016 10-K report)
The facts within an XBRL instance are organized per the rules of the XBRL technical specification. But, you can reorganize those facts to make the information easier to work with. You can specify how to reorganize those facts using global standard XBRL!
Now, again, it is unclear to me how to best store this sort of information in a database or what type of database is best. But what is crystal clear is that if you want to work with that information, you want to work with it logically and therefore it should be stored logically. Packing a bunch of XBRL into a database will be unsatisfying.
Here is some information about how 28msec queries worked which uses my model of the meaning of XBRL-based financial reports which is similar to how XBRL Cloud implemented a fact table and how Pesseract implemented a fact table and consistent with where SBRM is going:
More information coming soon.