BLOG: Digital Financial Reporting
This is a blog for information relating to digital financial reporting. It is for innovators and early adopters who are ushering in a new era of digital financial reporting.
Much of the information contained in this blog is summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the three documents on this digital financial reporting page.
What I am noticing is that one can detect accounting anomalies in the HTML versions of SEC financial filings by leveraging XBRL. This document, Detecting Accounting Anomalies Using Structured Information, summarizes a handful of accounting anomalies. These are not XBRL errors, these are accounting anomalies (some would say errors) in the HTML version of SEC financial filings.
Don't make the mistake of thinking that these are obvious or simplistic flaws or typos. What this shows is the tip of the iceberg. These issues are just scratching the surface of the real value which structured information, such as XBRL, provides to accountants creating financial reports. I am keeping this basic and as uncontroversial as possible to help see what structured information such as XBRL really offers.
Here is a quick summary of the accounting anomalies which I detected in HTML SEC filings by public companies:
- Balance sheets that don't balance: 41 cases.
- Improperly created classified balance sheets where current assets is reported, but current liabilities are not clearly indicated: about 185 cases.
- Equity attributable to parent not reported when a noncontrolling interest is reported: about 120 cases.
- Noncontrolling interest reported at the temporary equity level rather than within equity: 1 case.
- Two different approaches to computing net cash flow; one where exchange gains/losses are part of net cash flow, another where they are not: why are their two ways of doing this? This is odd and some accountants say this is an error.
In addition to these accounting anomalies (and remember, this is just the tip of a much bigger iceberg) the document above points out some things to consider in this age of digital financial reporting.
Analysts using software have to grab this information from the digital financial filing, in the case here we I am grabbing SEC XBRL financial filings from the EDGAR system. Automated reuse of this information by software should not be a guessing game. Software should be able to clearly identify and extract fundamental accounting concepts. If this is not straight forward, different software will provide different answers to exactly the same question. That cannot be a good thing.
Additionally, there are safe ways to report information, and there are unsafe ways to report information. Being explicit and unambiguous is a good thing if you want software to use your financial information in with the meaning that you intended when the information was created. Providing key totals rather than forcing software developers to spend time creating sophisticated software programs, and potentially software programs which act in different ways, is not really want is needed to make digital financial reporting work the way it needs to work.
So, how am I gathering all this information? Well, I build a software application in Microsoft Access which does most of the work, but then I use the XBRL Cloud Edgar Report Information web service to retrieve human readable renderings of the financial information. Below is a screen shot of the form which filters the reporting entities in the ways that I need:
Here is a YouTube video which shows this digital financial report analysis tool in action. Clearly this won't win any design awards, but it will give you an idea of things to come.
Software vendors, something to think about: If I can detect these errors after the fact, why is it that you cannot detect them during the financial report creation process and keep accountants from making accounting mistakes in the first place? Note that I did not say XBRL errors. I said accounting mistakes. That is the value proposition which will get accountants to value structured information and your products.
Imagine a Big 4 public accounting firm which has a policy that certain transactions, events, or other circumstances should be handled in certain specific ways. Imagine, for example, that a firm believes that balance sheets do in fact balance or that exchange gains and losses are always part of net cash flow or that noncontrolling interest is always part of equity. If a firm wanted to do that, they could establish a business rule and enforce that rule. If there are exceptions to that rule for any client, that fact would stand out.
Each public accounting term might have slightly different interpretations of the accounting rules. These are not random interpretations and there are not endless interpretations. For example, there are two interpretations as to where exchange gains and losses would be placed. Not 8 interpretations, only two. How do I know? Because 100% of SEC public company filings use one of two different approaches.
Is one approach wrong for where exchange gains and losses is placed the cash flow statement? Maybe. Why are two approaches necessary? Some accountants say one of the approaches are wrong. Other accountants say that US GAAP can be interpreted to include either approach. Is that a bug in US GAAP? Maybe. The ability to analyze digital financial reports will enable discussions like this to occur because anomalies such as this can easily be discovered. US GAAP will likely be tuned as a result.
There are thousands and thousands of little issues such as this. Like I said, what I am showing is only the tip of the iceberg.
For the 7160 SEC XBRL financial filings I have been analyzing, all 10-Ks, I did an analysis of the line items of different sections of the balance sheet.
First off, let me say that this analysis is not perfect just yet. For one, I am only able to get to the 91% of filers who provided calculation relations for their Assets and Liabilities and Equity roll ups. (i.e. 9% did not provide XBRL calculation relations for the roll ups and I am not reading those because I am using the calculation relations to find the children.)
Second, I realized that a better assessment would come from separating the filers who have classified balance sheets from those who do NOT have classified balance sheets.
Given the above, this is what I see:
You can get the raw data from this Excel spreadsheet if you desire.
Consider the column "Current Assets". There were an average of about 5 line items for the category current assets in the set of filings analyzed. The extension rate for current assets was 4.2%, slightly less than the average extension rate for the overall balance sheet which is 5.3%.
Here are the most commonly occurring current asset line items:
In the Excel workbook above there is a spreadsheet which contains a list of every extension added to current assets for all the filers. I took that list, I did a search on the term "inventor" (to catch inventory and inventories). I then removed the US GAAP XBRL taxonomy concepts. What was left were these extensions:
Really. The concept "us-gaap:InventoryNet" is not good enough?
Go grab the Excel spreadsheet, slice and dice it, if you find something interesting please let me know.
Something to note. So obviously because I am grabbing this information about the line items per balance sheet section, I can clearly identify the components of that section. This is whether the concept is an extension or not. While I cannot tell you WHAT the extension is without reading the concept documentation; I can get the line item.
As I write this it occurred to me that another interesting bit of information is the number of extensions per balance sheet section. I sort of have that information. If, for example, there are an average of 5 current assets line items and the extension rate is 4.2%; then the average filer has .21 extension concepts. Plus, given the obvious misuse of extensions for the concept inventories above, the extension rate can likely be reduced.
Anyway, have fun looking at the data if you are so inclined.
Many people focus on XBRL saying that XBRL about not having to rekey information. While it is true that if XBRL formatted information is expressed correctly that information does not need to be "rekeyed"; but I would suggest that a more enlighten focus might be the following: What does it mean if you don't have to rekey information?
What if you could predictably, reliably, repeatedly grab a piece of information; ascertain the appropriate context of the information which was an OUTPUT of that process; and send that information on as an INPUT to some subsequent process. What if a meaningful exchange of information could be achieved between two computer processes?
What if you could "chain" processes together? That is why people often refer to processes as a supply chain.
One of the things which could occur is the business users can delegate work to software agents.
So what the heck is a "software agent"? People sometimes use other terms such as "intelligent agent" or "smart agent". Using terms such as intelligent and smart, in my view, just add hype into the equation which often comes with things related to technology.
I don't want to contribute to this hype by setting unrealistic expectations of the state-of-the-art. People many times profess exaggerated claims which they can never deliver in order to fool ignorant buyers in order to make a quick buck. That is not my goal here.
My goal is to explain that some very basic work can be delegated to software agents and if this is done correctly, software agents can perform meaningful work, can reduce costs, and can increase quality. The first step which is required to make a software agent work appropriately is the meaningful exchange of information to a software agent.
As is said, "garbage in, garbage out". If a process outputs 98% accurate information; that process can never be completely automated because humans need to be involved in order to fix the 2% of the information which is incorrect. So before processes can be 100% automated, information exchanged via that process needs to be 100% correct, complete, consistent, and accurate.
I found the following two definitions of a software agent which help you understand what a software agent is. In his paper, Software Agents: an Overview, Hyacinth S. Nwana defines a software agent as:
we define an agent as referring to a component of software and/or hardware which is capable of acting exactingly in order to accomplish tasks on behalf of its user
In their paper Intelligent Agent, Hanh Tran & Thaovy Tran use the term intelligent agent which has the following definition:
An intelligent agent is a software that assists people and act on their behalf. Intelligent agents work by allowing people to delegate work that they could have done, to the agent software. Agents can perform repetitive tasks, remember things you forgot, intelligently summarize complex data, learn from you and even make recommendations to you.
I like both definitions. Neither has hype in my view. Two key aspects I would point out are the following. The first definition states "capable of acting exactingly". That is critical. What an agent does needs to be predictable and reliable. The second key aspect is "repetitive tasks". A software agent needs to repeat the same predicable, reliable process over, and over, and over.
I want to take the definition of a software agent just a little bit further and show the categories of software agents pointed out by Hyacinth S. Nwana' s paper. Here is a diagram which is provided in that paper:
That paper provides a lot of detail on a lot of rather sophisticated stuff which software agents might be able to do, but I don't want to go there in this very basic explanation of software agents. Go read the paper if you want that detail.
But I do want to provide some insight on the sorts of things software agents can do. One good example of a software agent can be seen in the vision of what is called the SEC's RoboCop. What the media calls "RoboCop" is a software agent which enforces an accounting quality model established by the SEC. What is in the accounting quality model? Who knows. Perhaps something as basic as checking to see if SEC filer's have balance sheets that balance or more likely more sophisticated checks of SEC XBRL-based financial filings.
RoboCop might seem "out there" because we cannot see how it works yet. Here is an example of something which you might be able to better get your head around because you can go check it out. I pulled reported information for a set of 51 fundamental accounting concepts from SEC XBRL financial filings. I also tested 21 relations between those concepts to try and make sure that the informaton I was grabbing was accurate. I found the information to be about 98% accurate. Eventually the information will be 100% correct I predict. That fundamental set of information has the reported facts necessary to compute the sustainable growth rate of every SEC filer. Here is that formula:
Sustainable Growth Rate = ((Net Income (Loss) / Revenues) * (1+((Assets - Equity) / Equity))) / ((1 / (Revenues / Assets))-(((Net Income (Loss) / Revenues) * (1+(((Assets - Equity) / Equity))))))
Imagine a software agent which monitors the sustainable growth rate of public companies for an analyst looking for investments. The software makes the analyst aware of companies who reach certain thresholds. This would allow the analyst to then focus additional analysis on the specific entities which have met her criteria.
Another really simple example of an agent is XBRL Cloud's EDGAR Dashboard. While that dashboard monitors verification/validation of SEC XBRL financial filings which is useful for making sure the SEC XBRL financial filings are properly created; it is not that big of a stretch to imagine other types of these dashboards which could be created.
One final example of the utility of a software agent is the analytical review process performed by auditors. Finding the information, lining it up in an application, computing the variance, finding and adding information about how peers of an entity performed is mindless work which software applications can easily perform for an auditor. That frees the auditor to focus on the analysis of the actual information and they will have more time for the analysis because software agents performed the mindless part of obtaining and providing the information.
The bottom line here is that the notion of software agents or intelligent agents or smart agents has been around for some time. The understanding of the sorts of things that these software agents can do for business users is in its early stages. As agents are better understood and as the quality of the information these software agents rely on to function correctly improves, these software agents will deliver increasing utility to business users. These software agents will never replace highly skilled professionals working with something like financial information. They will mainly perform mindless, repetitive work; the sort of work computer software does well. Information gathering will be significantly easier so the humans can focus on the sorts of things that only humans can do. Costs can be reduced and quality can be increased.
A hypercube is simply a table which can have any number of dimensions, or "n" dimensions. There are two constraints which must be differentiated in your mind: the medium using to display the information and the characteristics one is working with which describe the information.
A scalar is a piece of informtion which has no characteristics, it stands on its own. A scalar does not need to be characterized. For example, the value of pi is a scalar, it never changes; it always has the same value for everyone. (Pi or p is the ratio of a circle's circumference to its diameter and always has the value of equal to 3.14)
Visually, this is what a scalar looks like:
A list has one dimension. Dimensions are a model for expressing characteristics of information. Dimensions effectively conceptualized for unambiguous interpretation.
For example, the following is a list of numbers. But you don't know any more about the characteristics of the number. You don't know if the number is for, say, revenue, net income, assets; you don't know if the number is in US Dollars, Euros, or Yen; you don't know if the number is for Microsoft, Apple, IBM, or some other company. All you have is a list of numbers:
A table has two dimensions physical dimensions: rows and columns which intersect to form cells. A spreadsheet is a good example of presenting a table. Other terms used for table are matrix and array.
A table (spreadsheet, matrix, array) can have any number of rows and columns (or, to the limit of the software application; Excel for example does have limits).
So a table can hold pretty much any set of information and the characteristics which describe the information. But as the number of characteristics, rows, and columns increases; it gets harder and harder to work with the information. Also, if you are working with DIFFERENT tables which have different dimensions/characteristics things get more challenging.
This is why relational databases are so handy. Relational databases store information in the form of tables which can basically hold anything. But as the shapes of the tables change, it gets more and more challenging to put things into relational databases or manage all the things you have in a relational database.
Here is a small table with a value and four characteristics which conceptualize the value in the column "Fact Value":
A cube can be thought of as a three dimensional presentation of a table/matrix/array. "Space" has three dimensions. Paper only has two dimensions though. You can present three dimensional information on the two dimensional medium of paper by grouping information and printing one group, then the next, then the next linearly down the piece of paper. The software application Excel achieves three dimensions by adding the notion of a "sheet". In the old days of spreadsheets there was only one sheet. But today, Excel has the notion of a "workbook" which contains "sheets" which then contain rows and columns which intersect and form cells.
A pivot table is an approach to visualizing a complex table of information. There are two things going on with pivot tables. The first is the number of characteristics a value has and the second is how to show all of the information, including the characteristics, in usable ways. So you have both the information that you are dealing with and how to present the information. Again, a computer screen only has two dimensions, it is flat. It can be a challenge to render information which has more than two dimensions in a two dimensional space. A pivot table is one way of achieving that goal.
A hypercube is an "n" dimensional table/matrix/array. Where "n" represents any number of dimensions or characteristics. And so every table is basically a hypercube. Different tables or hypercubes can have different numbers of characteristics. It is easy to work with tables that have exactly the same characteristics or dimensions together. The more different the dimensions/characteristics, the more challenging it is to put different tables/hypercubes together.
Saying this another way, different tables/hypercubes have different shapes.
The more dimensions or characteristics that a hypercube has, the more challenging it is to rendering that information in two dimensional space such as paper or a spreadsheet. Computers can be better for presenting this sort of information to humans in readable forms because computers can be dynamic, like a pivot table.
Imagine an application which is good at handling n-dimensional information (i.e. any number of dimensions or characteristics. Consider this visualization of a four dimensional object or hypercube:
Or this visualization of a five dimensional object or hypercube:
Because a computer is dynamic, it is easier for a computer to take n-dimensional information and display it (a) in human readable terms and (b) dynamically, the user can change the representation to suit his or her specific preceived needs or demands.
A pivot table is one example of that sort of application.
SEC XBRL financial filings are nothing more than many, many hypercubes. Those hypercubes; referred to as explicit [Table]s or implied tables which have only reporting entity (or CIK), a period, and a concept which contextualize reported facts; must be managed by software application which work with the information.
Some of those software applications show static, unchangeable views of the information. The SEC Interactive Data Viewer is an example of a static application. Other applications let you change the view of the information, such as the XBRL Cloud Viewer or CoreFiling's Matnify.
An ontology seems to be a means to an end. Probably the most important notion to understand about digital financial reporting is that of secondary use ontologies. Secondary use ontologies are a currency of the information age. Understanding this currency is a key to thriving in the information age.
I don't have all of these pieces synthesized in my mind yet, like many of my blog posts this is more of a stream of consciousness dump on the journey toward understanding.
Something that I though was true may not be true at all. Or rather, it may only be partially true. The premise of "the semantic web" which was held by some is "anyone can say anything about anything." I thought that that statement was hogwash. Why? Well, think about the accounting equation: Assets = Liabilities and Equity. Who would dispute that fundamental building block of accounting?
Well, perhaps no one would dispute that specific fact. However, there are many other facts which exist which might be disputed by others. Said another way, there are likely some facts that few if anyone disputes and there are likely some facts that people do rightly dispute.
So what is the point? The point is this, maybe there is not just one financial report ontology. Maybe there are many. Perhaps there is some base ontology that everyone agrees on such as "Assets = Liabilities and Equity" and other fundamental accounting concepts and relations. Everyone agrees at this level. It is kind of like how two attorneys who are arguing a case establish the facts of a case, the things that neither side disputes. This allows them to focus on the things where they don't agree to be the focus of the case they are arguing.
This is actually not my notion; there are actually names for these different camps. Remember that an ontology is a description of the "things" and the "relations between things" which exist within a some determined field of study or "domain".
So, an ontology is a representation of something that exists in reality. An ontology is a "window on reality". There are two views of "reality" which are common in philosophy:
- Ontological materialism: The belief that reality exists regardless of human observers. (i.e. there is only ONE reality)
- Ontological idealism: The belief that reality is constructed in the mind of the observer. (i.e. reality is determined by the observer)
Does this mean that everyone can create their own reality? No. That is where epistemology comes in. Epistemology is the theory of knowledge.
So, what is knowledge? Some say that knowledge is "justified true belief". Basically, it seems that you need to have solid grounds for holding a belief and that you must be aware of such grounds. You can't just make stuff up. You must be able to justify your claims or beliefs.
How do you justify beliefs or claims? Evidence. Evidence must be of good quality and evidence should be logical and reasonable. Hard to argue with that. But how do you gather this evidence? Well, again from philosophy, there seems to be two approaches to that:
- Empiricism: Evidence of true knowledge is primarily found on input from our senses; by experience and by observations.
- Rationalism: Evidence of true knowledge is primarily found based on reasoning; by research results which are verified by reasoning.
How does knowledge differ from opinion or belief?
Epistemology has other aspects which explain how knowledge is obtained such as "propositional knowledge" which is the knowledge of facts and "practical knowledge" which is know how.
Why is knowledge important? Here is a great quote by Nicholas Rescher, a philosopher:
...Knowledge brings great benefits. The release of ignorance is foremost among them. We have evolved within nature into the ecological niche of an intelligent being. In consequence, the need for understanding, for "knowing one's way about," is one of the most fundamental demands of the human condition.
"Knowing one's way about." Before digital financial reporting, the only way one had to "know one's way about" was the information one had in their head. What one person had in their head may have been different than what another person had in their head.
The rigor and discipline of trying to articulate in the form of an ontology what is in your head yields two things as I see it. First, it yields clarity. Articulating this information in a form that both parties can better observe yields clarity. Second, it yields something that a computer can understand. Digitizing financial reports is a lot of work, but it makes it so computers can interact with those digital financial reports.
At some level, there needs to be agreement on "reality". For example, the fundamental accounting concepts and relations between those concepts of a financial report is such a level. At other levels, secondary use ontologies which build upon the fundamental level, there may be different views of "reality". Distinguishing between these two levels seems critical. It is highly likely that more variability will exist in secondary use ontologies. However, those creating those secondary use ontologies cannot just make things up. They must justify their beliefs and claims with quality evidence, logic, and reasoning.
I put together a play list of a handful of short videos which digs a little deeper into semantics, ontology, and epistemology. These are terms business people are not that comfortable enough with. Given that digital financial reporting is inevitable, perhaps business people need to get more comfortable.
Business people need to understand the currency of the information age. Computer readable knowledge is part of that currency.