Understanding Cell Stores and NOLAP, the Future of the Spreadsheet
Ghislain Fourny of 28msec, creators of SECXBRL.info, came up with the notion of the Cell Store which he describes in a paper he wrote on the subject. He describes a cell store as:
Cell stores provide a relational-like, tabular level of abstraction to business users while leveraging recent database technologies, such as key-value stores and document stores. This allows to scale up and out the efficient storage and retrieval of highly dimensional data. Cells are the primary citizens and exist in different forms, which can be explained with an analogy to the state of matter: as a gas for efficient storage, as a solid for efficient retrieval, and as a liquid for efficient interaction with the business users.
The XBRL standard helped to inspire the idea of the cell store. One thing that the paper does not mention but should have mentioned is the XBRL Abstract Model 2.0. The XBRL Abstract Model 2.0 basically articulates the high-level semantic model of a business report.
There are a handful of other adjustments that I would make to the Cell Stores paper.
First, the paper is not clear enough on how business rules are used to both enforce quality and enforce semantics between cells. Cells can be related to other cells. Part of the power of a cell store is the complexity of what can be represented. A perfect example of this is public company financial filings. Those are complex documents with many, many complex relations. Fundamental relations such as "assets = liabilities and equity" and more complex relations. These business rules are critical because they keep the information quality high. But the business rules provide something else: increased reasoning capacity of software applications.
Second, the notion of "concept arrangement patterns" and "member arrangement patterns" are not explained well in the paper and some errors exist. This is partly my fault. These names have changed (from pattern, to metapattern, to accounting concept arrangement pattern, and finally to concept arrangement pattern). These relations exist and they are critical to making this system work correctly. The notion of a "whole-part" relation is left out all together.
Third, the paper says that these concept arrangement patterns and member arrangement patterns exist "according to me". That is not the case. What I have done is observed and documented. These patterns exist in SEC XBRL financial filings. It is those financial reports that state that these patterns exist. I simply made the observation and wrote the information down. Empirical evidence proves that these relations exist, not my opinion.
Finally, the paper points out very clearly that XBRL enables the interchange of "data" between different systems. What the paper does not point out clearly enough is that not only can you exchange "the data" but you can also exchange "the model", the metadata. This is a crucial distinction. Which brings us to NOLAP.
Another notion that Ghislain came up with was NOLAP. NOLAP, which is explained in the Cell Store paper and which I explain here, overcomes issues with OLAP. Here is a summary of issues with OLAP:
- There is no global standard for OLAP
- Cube rigidity
- Limited computation support, mainly roll ups
- Limited business rule support and inability to exchange business rules between implementations
- Inability to transfer cubes between systems, each system is a "silo" which cannot communicate with other silos
- Inability to articulate metadata which can be shared between OLAP systems
- Focus on numeric-type information and inconsistent support for text data types
- OLAP systems tend to be internally focused within an organization and do not work well externally, for example across a supply chain
- OLAP tends to be read only
Over the years I have mentioned the notion of a "semantic spreadsheet". NOLAP (not only OLAP) is essentially the same thing as my idea of a semantic spreadsheet. This video of the application Quantrix is the closest visual that I have of what a semantic spreadsheet is and how it might work. (Here are more videos)
But Quantrix has issues:
- It is OLAP, not NOLAP
- The format is proprietary, not a global standard
- Quantrix does not understand domain semantics, only report level semantics
I point out these issues not to knock Quantrix, but to point out what NOLAP needs to be. Imagine a ubiquitous spreadsheet format (i.e. not owned by Microsoft) that works within Microsoft Excel, Google spreadsheet, Apple Pages, OpenOffice, connected to relational databases, other proprietary applications, etc.
The application is not like a normal spreadsheet which is "glued" together presentationally with the notion of a workbook, worksheet, row, column, or cell. The spreadsheet is glued together with meaning. Because the spreadsheet is glued together with meaning, in order to understand how to use it you only need to understand the meaning. The spreadsheet acts more like a pivot table. The pivot table is read/write. Information is stored not locally (although it could be, that format would be XBRL), but within a cell store. Or, the information could be taken out of one cell store and transferred into another cell store internal to your organization or external to your organization.
Folks, you don't have to imagine this. You can experience this, right now, today. Call me crazy, but the XBRL-based EDGAR system of public company financial information is exactly that. But there are some issues.
There are some details that are not quite working as they need to work. OK fine, fix the issues. Here are the issues and how to fix each issue:
- Performance: Querying the SEC EDGAR system information is not performant, it is simply a bunch of files within a file system. THE FIX: Put the files into a cell store or some other database.
- Information quality: The quality of the data of XBRL-based financial filings is poor. However, it is improving and it can be improved even more by using more rules. The rules keep the relations between the cells, the information, correct. THE FIX: Add more rules where the information is incorrect to force it to be correct.
- Creation software: Most software vendors made two mistakes in creating software for creating XBRL-based public company SEC filings. (a) The software is too hard to use because it was not built correctly and (b) the software vendor took a myopic view and made the software ONLY work with SEC filings. THE FIX: Build better software and make the software work with XBRL-based SEC financial filings, but not ONLY with SEC XBRL-based financial filings.
- General profile: Six years ago, Rene van Egmond and I pointed out that XBRL implementations are not interoperable. No system uses all aspects of XBRL. However, there is no real "application profile" defined which can be used to create "general" XBRL-based semantic spreadsheets. You have specifications for creating an SEC filings, you have all the rules for that. Those rules come from the US GAAP itself, the US GAAP XBRL Taxonomy and the SEC Edgar Filer Manual (EFM). Someone defined that profile. THE FIX: Create an agreed upon "general profile" which software vendors can implement, for example this is a general profile that I created which is based on and leverages the good aspects of the US GAAP XBRL Taxonomy Architecture/SEC system architecture, but overcomes the bad aspects.
Maybe I missed a few details. But that is all they are, details. I count 22 software vendors who already support the creation of XBRL-based SEC financial filings. 28msec already has a database. There are several other databases that I am aware of. XBRL Cloud has the validation capabilities, see their EDGAR Dashboard. There is already analysis software. What is missing is (a) software vendors collaborating to put these pieces together and (b) an understanding of the existence of a general application profile for doing this.
Companies that don't innovate risk becoming extinct. Do you want to take that risk?
Reader Comments