« Future Directions for Semantic Systems | Main | SBRM Progress Report »

Storing Financial Information in a Database

Most people that look at XBRL-based financial reports and then want to store XBRL-based financial information from those reports in some sort of database make the fundamental mistake of "storing XBRL".  This is, as contrast to, storing financial facts or information that is helpful in using those financial facts in some sort of "fact database".

This blog post provides information that helps you not repeat this sort of mistake.

First, have a look at the UML model of a business report created by SBRM (keeping in mind that a financial report is a type of business report). Note that if you want to work with XBRL-based financial reports, you are going to want to understand the OMG Standard Business Report Model (SBRM).

Second, while you can store properly modeled financial information within a relational database very effectively; relational databases might not always be the best storage alternative.  For example, TigerGraph and ArangoDB are a graph database that seems to be a rather interesting information storage alternative.  That may, or may not, be an appropriate data storage scheme.

Years ago, I helped a company named 28msec put all of the XBRL-based financial reports submitted to the SEC into a NoSQL database.  They really studied XBRL and frankly, I learned a tremendous amount from this experience.  After this project was done, Ghislain Fourny who is a data scientist and database expert wrote this paper, Cell Stores, and the book, The XBRL Book: Simple, Technical, Precise.  If you want to put XBRL-based financial information into a database; I highly recommend that you read both that paper and the book.  That offers clues about the best information storage formats.

In the Cell Stores paper, Ghislain makes an analogy of financial information to the different states of matter:

Cells are the primary citizens and exist in different forms, which can be explained with an analogy to the state of matter: as a gas for efficient storage, as a solid for efficient retrieval, and as a liquid for efficient interaction with the business users.

Now, I personally do not understand how to store information the most effectively in a database.  But I can tell you the general use cases that business professionals have.  You can distill those use cases down into three query patterns: 

  1. Specific information: Getting some set of information from one specific report. For example, "Give me the most current balance sheet of Microsoft" or "Give me 'assets' for Microsoft as of December 31, 2019." (Example 1 Microsoft 10-K | Example 2 Microsoft 10-K |Example 3 XBRL Cloud | Example 4 SEC RSS Feed)
  2. Period comparison: Comparing some set of information across a set of reports of a specific economic entity across periods.  For example, "Give me a comparison of the balance sheet of Microsoft for the last 10 years." (Example 1 Cash Flow | Example 2 Comprehensive Income)
  3. Entity comparison: Comparing some set of information across a set of reports across a number of economic entities for some specific period.  For example, "Give me a comparison of the balance sheets of every economic entity in the DOW 30 for December 31, 2019." (Example 1 Dow | Example 2 Airlines)

That is really about it.  The "magic" is to filter all that XBRL-based information to get exactly what you are looking for.  To do that, the information needs to be stored approprately to allow for effective information retrieval.  And so, you want to store the information such that it can easily be reconfigured into the three types of queries above at a minimum. 

There are a lot of details involved in that process.  Key is storing meaning, not syntax.  What is critically important to understand is that financial reports ARE NOT FORMS!!!.  Does that make the work involved harder?  Absolutely it does.

But, the task for all three common query use cases can be distilled into three specific sets of information (XML infoset examples provided for each information set) that "convert" the "gas" into a "solid" per the analogy above: 

  1. What structure? Do you want the balance sheet, the income statement, the revenue recognition policy, or the long-term debt maturities disclosure?  Each set of information is essentially some "structure".
  2. What facts?  Do you want information for fiscal year 2019, fiscal year 2018, year-to-date, quarter-to-date, actual, budgeted, etc?
  3. What rules?  What are the mathematical relationships between the facts in the structures?  If you want to render this information in human-readable form, what are the rendering rules that should be used?

All of this is explained in detail in the document Fact Sets. A financial report is not one big thing, it is a system of lots of little things.  That system is logical.  That system has logical patterns. A pattern is any form of correlation between the states of elements within a system. Software engineers can leverage patterns.

XBRL Cloud does a lot of this in their Edgar Report Information database.  Watch this video to get a general idea of the information you can extract for any piece of any XBRL-based financial report.  What XBRL Cloud has is not perfect, but it is pretty close and is available via an easy to use REST API.

Here is specific information for the 2017 Microsoft 10-K. In the ZIP file are Excel spreadsheets showing the fact sets discovered by XBRL Cloud which does a pretty good job and Pesseract which does an excellent job.  Also included are the XML Infosets that were generated by each application.  You can look at a human readable rendering of the 2017 10-K of Microsoft here.

When represented correctly, XBRL presentation relations, XBRL calculation relations, XBRL definition relations, and XBRL formula become information that a query can use to extract facts.

For example, the Microsoft 2017 10-K contains 128 networks, 128 hypercubes, and 194 different "blocks" of information. A "block" and a "fact set" are synonyms.  Facts are fine to work with, when you are after individual facts such as "Assets" or "Net Income (Loss)".  But if you want to work with a specific disclosure, you need some specific set of facts.  XBRL presentation, calculation, and definition relations are helpful in extracting that information and XBRL formulas and other rules are helpful in making sure you put the pieces together correctly.  (here is similar information for Microsoft's 2016 10-K report)

The facts within an XBRL instance are organized per the rules of the XBRL technical specification.  But, you can reorganize those facts to make the information easier to work with.  You can specify how to reorganize those facts using global standard XBRL!

Now, again, it is unclear to me how to best store this sort of information in a database or what type of database is best.  But what is crystal clear is that if you want to work with that information, you want to work with it logically and therefore it should be stored logically.  Packing a bunch of XBRL into a database will be unsatisfying.

Here is some information about how 28msec queries worked which uses my model of the meaning of XBRL-based financial reports which is similar to how XBRL Cloud implemented a fact table and how Pesseract implemented a fact table and consistent with where SBRM is going:

 

 

More information coming soon.

Posted on Sunday, February 2, 2020 at 08:30AM by Registered CommenterCharlie in | CommentsPost a Comment

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.