BLOG: Digital Financial Reporting
This is a blog for information relating to digital financial reporting. It is for innovators and early adopters who are ushering in a new era of digital financial reporting.
Much of the information contained in this blog is summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the three documents on this digital financial reporting page.
Quality of public company XBRL-based financial filings to the SEC improved yet again as measured against a set of 21 fundamental accounting concept relations.
Last month (June 30) there were 3 software vendors/filing agents who had 80% or more of their filings consistent with all fundamental accounting concept relations. As of July 31, than number has grown to 5. We are closing in on the point where 75% of all public companies are consistent with all the fundamental accounting concept relations, standing at 74.7% currently. We are also closing in on 99% consistency on a per test basis.
Per generator (software vendor or filing agent) results:
Per consistency test results: (note that on a per-test basis, all test are now at least 95% consistent or higher)
A reasoner is software that is able to infer logical consequences from a set of asserted facts. Every reasoner uses some sort of logic. For example, first-order predicate logic is a type of logic. Every reasoner works with some set of axioms. An axiom describes some logical fact. The capabilities of a reasoner depend on the expressiveness of the kind of logic that the reasoner uses and the axioms provided for the reasoner and logic to work against.
Reasoners are sometimes referred to as inference engines because while, as stated above, reasoners work with asserted facts; reasoners can also use the rule of logic to deduce theorems. Theorems are indirectly deduced facts. Theorems are deductions which can be proven by constructing a chain of reasoning by applying axioms. Basically, a reasoner and an inference engine is the same thing.
A rules engine is also a reasoner. Another name for a reasoner or inference engine or rules engine is semantic reasoner.
An XBRL Formula Processor is basically a reasoner. Did you realize that? I will get back to that in a moment.
Clearly a human's capacity to apply logic is greater than a computer's capacity to apply logic. In fact, computers are machines and really can't think or apply logic. All that a computer can do is mimic or simulate or emulate a human's ability to think. Some computer programs that mimic human thought or perform some task for humans are called expert systems. Every expert system uses a reasoner to figure out what that system needs to do for the human and how to do it.
I pointed out that care has to be taken in order to express facts in a form that is safe, reliable, predictable, and repeatable. There are four catastrophic problems that a computer can run up against;
- Undecidability (i.e. must be decidable)
- Infinite loops (i.e. must eliminate possibility of cycles)
- Unbounded structures or pieces (i.e. must have known set of structures)
- Unspecific or imprecise logic (i.e. things like fuzzy logic is not allowed in this type of system)
Correctly balancing the expressiveness of a logic and the safety, reliability and predictably of a piece of software to return useful information takes conscious, skillful effort and execution. Years of experimentation in the area of expert systems and artificial intelligence has yielded invaluable information in achieving this balance.
First-order predicate logic is a formal way of expressing logic in a manner that is machine-readable.
While first-order predicate logic is expressive and powerful in performing work, it is not decidable and other problems can occur.
PROLOG is an attempt to address issues with first-order logic. In creating PROLOG, the problem of decidability and cycles was partially addressed by limiting which first-order predicate logic statements can be used to a Horn clause. But even PROLOG had issues and so further restrictions were made to first-order logic expressed using Horn clauses and Datalog was created.
DATALOG is a restricted subset of PROLOG. DATALOG is described as a query language based on logic. People are combining relational databases and DATALOG and creating what they call "deductive databases". Datomic is one such database. It seems that DATALOG is a de-facto standard deductive query language. (Here is more information on DATALOG.)
The semantic web folks seem to have had a similar evolution. They started with OWL FULL or older versions of OWL and then created limitations to deal with the problem of decidability. State-of-the-art semantic web technologies such as OWL 2 DL have been limited to solve the problem of decidability by limiting the logic to SROIQ description logic which is decidable.
OWL 2 DL has a boatload of reasoners. What I don't understand is the relative expressive power of an OWL 2 reasoner and something like DATALOG.
However, SROIQ description logic does not support expressing mathematical relations. The reason is, some math is not decidable. Eventually they will fix that most likely.
Back to XBRL Formula Processors. An XBRL Formula Processor is generally seen as something that validates XBRL instance facts. Says so right here in the XBRL Formula 1.0 Specification, see the Abstract section. But it is becoming pretty clear to me that what an XBRL Formula processor really is, is a business report reasoning engine. Or rather, that is what it SHOULD be in my opinion.
XBRL Formula has some distinct advantages over something like OWL 2 DL. The first advantage is that XBRL Formula does math. The second thing is that XBRL Formula has an understanding of XBRL Dimensions. That means that not only can XBRL Formula do math, it also supports a dimensional model.
However, there are several deficiencies in XBRL Formula processors:
- XBRL Formula processors do not support process chaining. Supporting chaining was discussed but they decided not to do it. PROLOG and DATALOG support chaining. Not sure is OWL 2 DL supports chaining.
- XBRL Formula processors do not understand and use the "general-special" or "alias-essense" standard XBRL arcroles. Basically, XBRL Formula processors don't understand class relations.
- XBRL Formula processors are focused on XBRL instances, they don't provide much functionality for working with XBRL taxonomy information.
My personal opinion is that the world would be a better place if something that had the combined functionality of something like DATALOG and an XBRL Formula Processor; if that combined piece of software struck the correct balance between expressive power and safety/reliability/predictability (i.e. it avoided those four logical catastrophes); and if there was a layer build that helped business professionals work with all this stuff effectively and successfully.
Per the law of conservation of complexity and the idea of irreducible complexity; not until this business report reasoner exists can XBRL ever really be usable by the average business professional. But imagine if such software did exist. Any business professional could build their own little or even big expert system inexpensively.
This is the definition of a spreadsheet provided by Wikipedia:
A spreadsheet is an interactive computer application program for organization, analysis and storage of data in tabular form. Spreadsheets developed as computerized simulations of paper accounting worksheets. The program operates on data represented as cells of an array, organized in rows and columns. Each cell of the array is a model–view–controller element that may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells.
A spreadsheet is essentially a domain-specific programming language. What? A spreadsheet is a programming language??? There are essentially two fundamental pieces to a spreadsheet: (a) the model and the spreadsheet language and maybe (b) macro language that can be used to relate one cell with another cell. The model is expressed via a modeling language which expresses the rules that outline the structure of a spreadsheet. The language states things like a workbook is made up of spreadsheets, a spreadsheet is made up of rows and columns which intersect to form cells. The macro language used for expressing relations between cells and manipulating the values of cells or even the structure of the spreadsheets, columns, rows and cells of a workbook or even set of workbooks.
There are three key things about spreadsheets that one should be aware of:
- Note the statement "data in tabular form" in the Wikipedia definition of a spreadsheet.
- Note that "workbook" and "spreadsheet" and "column" and "row" and "cell" are presentation oriented terms and structures.
- Note that the programming language or macro language specifically understands what a workbook is, what a spreadsheet is, what a column is, what a row is, and what a cell is. The programming language also has general features such as "if...then" statements, "case" statements, and other such common programming functionality.
This is my definition of a semantic spreadsheet:
A semantic spreadsheet is an interactive computer application program for organization, analysis and storage of multi-dimensional information. Semantic spreadsheets developed as computerized simulations of set of paper accounting worksheets. The program operates on information represented as cells of an array, which can be visualized in rows and columns of something similar to a dynamic pivot table. Each cell of the array is a model–view–controller element that may contain either numeric or text information.
Unlike a spreadsheet which is connected presentational via the rows, columns, and cells of a sheet which don't have names but rather labels such as "Row 1" or "Column B"; semantic spreadsheets are connected together via the meaning and logic of the information itself.
Unlike a spreadsheet whose cells are manipulated by a programming paradigm that is generally procedural in nature; a semantic spreadsheet is described and verified to be represented correctly against that description using a logic-based programming language. PROLOG is an example of one such logic based language. Procedural and other types of programming paradigms can still be used to manipulate a semantic spreadsheet; but rather than interacting with the row numbers and column letters of the spreadsheet programs interact with the meaning of the information.
The best semantic spreadsheets support the import and export from/to global-standard information exchange formats such as XBRL or OWL 2 DL. Support for global-standard formats enables the exchange of information between different semantic spreadsheet implementations.
Semantic spreadsheets allow for the use of OLAP-based information but do not require the use of OLAP. Semantic spreadsheets overcome many of the problems of OLAP and problems of presentation-oriented electronic spreadsheets.
While semantic spreadsheets are very powerful and in the class of software deemed to be expert systems; they semantic spreadsheets are also very easy to use for three specific reasons:
- Semantic spreadsheets are business domain specific tools rather then general purpose tools.
- Business users making the use of semantic spreadsheets interact using business domain terms familiar to their business domain.
- Semantic spreadsheets strike an optimal balance between expressive power, reasoning capacity, and the reliability/predictability demanded for many business use cases.
Functionality is achieved by burying most knowledge engineering principles deep within software platforms and software applications used by business professionals (see the law of conservation of complexity). What business professionals loose in terms of the flexibility to solve any problem using general purpose tools; they gain in ease of use by both the absorbing of complexity within software and generous doses of the 80/20 rule.
Enterprise-class software extends the sound base established by global-standard semantic spreadsheets enabling business use cases that have additional needs to both leverage the solid foundation, but also extended that foundation to meet additional needs.
A digital financial report is a specific type of semantic spreadsheet and follow their same architecture however metadata is specific to the financial reporting scheme used by the economic entity creating the financial report.
The first semantic spreadsheet tool was created by _____(insert company here)_____ .
I have mentioned the notion of "decidability" when I did a blog post related to description logic. When I discussed the notion of decidability with others, many times they seemed to be lumping other things in with decidability.
And so, I am tuning and I think improving my ability to express what I am trying to say. This is an improved attempt to summarize and synthesize these ideas.
There appears to be four "logical catastrophes" or "failure points" that the type of business system that I am working to create and many other similar types of business systems MUST NEVER HAVE. These characteristics are so catastrophic to the system they must never exist. Besides, these characteristics never exist in the reality that the system is trying to represent in machine-readable form.
This is a summary of these four logical catastrophes or "failure points" which must never exist:
- Undecidability: "I don't know" or "unknown" is NOT an option as an answer to any question. A big part of this is making the closed world assumption rather than the open world assumption. What is interesting is that XBRL 1.0 and I am pretty sure XBRL 2.0 allowed for explicitly stating whether the closed world assumption was being made. Also, relational databases make the closed world assumption. On the other hand, many of the "anyone can say anything about anything" folks working to build the semantic web take the open world assumption by default. It is not to say that one assumption is right and the other is wrong. It is to point out that one assumption works one way and the other works another way and business systems generally need to be decidable and would make the closed world assumption. And this is not an "either/or" type question. All one needs to do is be explicit and not make others guess. Digital financial reporting needs to make the closed world assumption and therefore be decidable for the reasons explained here. Why? For the exact same reasons OWL 2 DL makes the closed world assumption.
- Infinite loops: It is not hard to understand the problems caused by getting into an infinite loop from which a system can never escape. The reality represented by the business systems that I want to create don't have infinite loops. Therefore, there is no need to express something as an infinite loop. Another term that helps to understand loops from graph theory or network theory is a cycle. This stuff hurts my head to think about, but the basics of what a business professional needs to know is the difference between a directed cycle and an undirected cycle. Basically, never use directed cycles, they cause potentially infinite loops. Again, what is interesting is that XBRL consciously provided a means to eliminate directed cycles from ever appearing in an XBRL taxonomy. I don't think OWL 2 DL has this ability.
- Unbounded pieces; unbounded sets: First-order logicor also known as first-order predicate logic can only work on finite systems. An infinite system can never be explained successfully using first-order logic. The pieces that make up XBRL are: fact, characteristic, parenthetical explanation (XBRL Foot note), network, hypercube ([Table]), dimension ([Axis]), member, primary item ([Line Items]), abstract, and concept. EVERYTHING in XBRL is one of those things. You can add any number of those things. You cannot invent new things and arbitrarily add them to a system. While the XBRL Technical Specification does allow for the addition of new things; most systems created have some bounded set of pieces. Further, every set is likewise bounded. There is a specific, countable, number of facts in an XBRL instance; always. This blog post and this blog posthelp you see that the pieces that make up XBRL are well bounded. Likewise, every set of such pieces is finite.
- Unspecific logic: It is not expected that the business system at the level of describing the things in the system be able to support "fuzzy logic" or "probabilistic reasoning" or other such stuff. Now, when you use the information from the system, you can do whatever you want. But, describing what is in the system and what is not is not a "probability", it is a fact and the answer is it is there or it is not there; there is no in between and the answer is not a statistical probability. For example, "What is the value of assets?," is a number, not a probability.
That is my best attempt at describing the requirements of the type of business system that digital financial reporting needs to be. This is not a personal preference, this is about science. This is the only type of system what will work the way professional accountants need the system to work. There are a lot of other systems which would have similar requirements. And this is not to say that other systems can have different requirements.
And so the question is this: What logic or calculus should be used to represent such a system? OWL 2 DL might work, but OWL 2 DL does not support mathematical computations because some such computations cause a system not be decidable.
XBRL can do this. There are exactly ZERO catastrophic failures if you read the entire set of XBRL-based financial filings which have been submitted to the SEC. Not one. While the closed world assumption is not explicitly stated, it is assumed. No infinite loops. A bounded set of pieces can be used to construct an XBRL-based report. A bounded, finite number of pieces exist for each XBRL-based report. Fuzzy logic was not used to create the reports, the creation rules are specific.
Now, within the XBRL-based reports there are mistakes in the articulation of meaning. Many inconsistencies such as that. But, that is not remotely close to a catastrophic failure. That is a detail. Those inconsistencies are being detected and corrected.
I would be very interested in the thoughts of others who are knowlegable about how to make such a system work. I am not 100% sure that I am describing this correctly. But I am 100% certain about what I am trying to achieve. What I really don't understand is exactly the best way to achieve it. I have some ideas, but I don't know the answer yet. So please let me know what you think.
The video, A Theory of a System for Educators and Managers, discusses Dr. W. Edwards Deming's view of systems in general by looking at the education system. If you don't know who Deming is; he is the guy that the U.S. automotive industry ignored but the Japanese automotive industry did not ignore, allowing the Japanese automotive industry to overtake the U.S.
I don't understand the relation between Deming's ideas and Six Sigma, but they seem very related.
These ideas were developed to improve production processes. But they are just as applicable to services and even to information systems.
Cooperation and collaboration is key to systems:
Working together is the main contribution to systemic thinking as opposed to working apart separately.