BLOG: Digital Financial Reporting
This is a blog for information relating to digital financial reporting. This blog is basically my "lab notebook" for experimenting and learning about XBRL-based digital financial reporting. This is my brain storming platform. This is where I think out loud (i.e. publicly) about digital financial reporting. This information is for innovators and early adopters who are ushering in a new era of accounting, reporting, auditing, and analysis in a digital environment.
Much of the information contained in this blog is synthasized, summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the chapters of Intelligent XBRL-based Digital Financial Reporting. If you have any questions, feel free to contact me.
Entries from April 20, 2014 - April 26, 2014
Understanding Database/Query Options (Part 1)
OK, I am getting closer and closer to where I need to be in terms of understanding format/database/query options and related issues/opportunities. Conversations with others has provided additional insights and information. Building on this prior post:
People tend to agree that there are three primary formats for representing/modeling highly structured and semi-structured information and that there are different types of databases or other approaches to storing this information:
- Table-type format (homogeneous, tabular, consistent): relational databases, CSV, spreadsheets, or tabular-type representations which allow only one level of hierarchy within each table; but hierarchies can be constructed by relating tables
- Tree-type format (heterogeneous, arborescent): XML, XBRL (using tuples), JSON and other tree-hierarchy-type information which allow for the expression of one hierarchy; hierarchical type databases, object type databases (Note that a tree is a special case of graph. See here to understand the difference)
- Open-type or Graph-type format (heterogeneous, arborescent): RDF, EAV, XBRL (using dimensions) and other open schema-type or graph-type representations which are more graph-oriented or network-oriented (many-to-many) and allow for dynamically creating virtually any number of hierarchies; supports the notion of cycles; very flexible; network type databases; RDF triple stores.
People tend to agree that data formats are 100% interchangeable. A "tree" can be expressed in the form of a table and put into a relational database. A "graph" can be expressed in the form of a table and likewise be put into a relational database. A "table" can be expressed in the form of a tree or graph. These formats are syntax and any of these syntaxes can be used to store any type of information.
Now performance is important. Performance is created in any data format via the use of indexes. Relational databases have indexing capabilities, hierarchical databases have indexing capabilities, and network type databases have indexing capabilities.
People tend to agree that information is more interesting than data. It is not data that people are after, people are after information for the sake of knowledge. Relational databases are popular because of the 'relational' piece, not the 'data' piece. Relational databases are about, in part, organizing sets and relating one set with another set, getting the answers to questions back fast and easy across multiple sets. This is about using information with other information, comparing information. It is not about having a store of data. It is about making use of that data.
People tend to agree that the volume of information is growing rapidly. "Big data" is one of the new buzz words. Ask people what "big data" means and the average business person probably could not tell you, but they will tell you that they need some because some software vendor says everyone needs big data! I have heard to good definitions of big data. The first definition is:
BIG DATA: Data that's disparately located, varied in structure, voluminous, and rapidly changing.
The second definition is something like this:
BIG DATA: Data that is generated by machines. The data is "big" because the machines can generate the data faster than humans can consume the data. Humans really cannot create big data.
I would synthesise those two definitions and some other things that I know into the following explanation of what big data is:
"BIG DATA" is the notion that you no longer have the luxury of treating one database as 'the database' and putting all of the information you have into that one database. Data that you need exists within your organization and external to your organization. The data varies in the representation format (table, tree, graph). It varies by system (Windows, Mac, Linux, etc.). It varies by structure. The volume of information is high and it is getting higher. The velocity which data grows is increasing rapidly. Some of the information changes rapidly. Some of the data is generated by machines faster than humans can consume it. Welcome to the information age!
People tend to agree that there are four things which make it possible for one system to interoperate with another system. Another way to look at it is that HOW two systems CAN interoperate. The clarity of an interaction is determined by these four things:
- Classification system used
- Power of the technical syntax to express information
- Business rules used to force information quality to be high
- Interoperability between systems (system, information syntax, information structure, domain semantics, process/workflow protocol)
This seems like a lot of work. Why the heck would you want to even do this? The answer to that question is simple: automation. This is all about enabling computers to do more work. Again, welcome to the information age.
People tend to agree that financial information tends to be "flat" and it seems to fit nicely into spreadsheets and relational database tables. Some believe this so much so, they wonder why XBRL isn't more like CSV.
While financial information does seem to be flat and you can put it into spreadsheets and relational databases; each report component of a financial report tends to have a different shape or could be a different shape and therefore you tend to have to put reach report component into a different spreadsheet.
For example, take a look at this financial report which is expressed digitally. Here is the rendering of how a human might work with the information. Here is the underlying fact table which shows how the information might be stored in a database. This is where people tend to stop thinking about that is stored. But, you also need to store the information about the representation, I call it the model structure. And you will also want to store the business rules which help communicate important relations and verify that the relations are intact.
So you can see that while each piece of a financial report can be seen as tabular, each table is potentially different and is determined by the different "dimensions" (or [Axis], or Aspects, or characteristics, people use different terms) associated with the information.
That is what causes the different information shapes, different dimensions used on different report components of a financial report. For example, the "Business segment dimension" is appropriate for the "business segment breakdown" but it is not used on the balance sheet or inventory components breakdown. So, that is why dimensions can vary in tables/spreadsheets, that is what causes the need to store the differently shaped report components in different relational database tables.
Now, some people have the *brilliant* idea of not using those dimensions, let's just ignore the fact that the dimensions exist. Really? This idea is so absurd that I am not even going to address it. Financial information has dimensions.
And so, while financial information can be considered tabular and that tabular information can fit into a spreadsheet or relational database, ever report component needs to be stored in a different table because every report component could have a different shape. No problem, you can create as many tables as you want in a relational database.
Or rather, the IT department can add that new table for you.
To add a new dimension in a relational database, what do you have to do? You have to modify the database schema. Bad for business users, that is an IT task. Most business users cannot modify database schemas, create the proper indexes, etc. Now, it is possible to automate some or much of the task of adding tables, indexes, and all the other stuff needed to modify a database schema. But that takes programmers writing code. And writing code takes time and costs money. So business users have three choices: (a) learn to make these sorts of changes to relational databases themselves, (b) wait for the IT department to do it for them, (c) pay for the creation of software which makes these technical oriented adjustments for them.
People tend to agree that there are different query languages:
- SQL: Structured Query Language, a global standard query and functional programming language used by relational databases. SQL queries table-type data.
- XQuery: XQuery is a global standard query and functional programming language that is designed to query and transform collections of structured and unstructured data, usually in the form of XML or text. XQuery queries tree-type data.
- JSONiq: JSONiq is an open, third-party extension of XQuery which, among other things, expands XQuery to be used with JSON formatted information.
- SPARQL: SPARQL Protocol and RDF Query Language is a global standard RDF query language, it is used to query graph-type or open-type data formatted in RDF.
- MDX: Multidimensional eXpressions, a query language for OLAP databases. This was first developed by Microsoft but it seems to be sort of a standard, it does seem to be popular.
People tend to agree that SQL is a fantastic query tool. Personally I love SQL. And, because we said above that any information can be expressed using any data format, clearly we can express all these tables in a relational database. You can create foreign keys and relate any relational database table to any other relational database table. No problem there. IT can do that for you or you can do that yourself.
However, the queries you have to create to get information out of that relational database get increasingly complex with more and more of these types of relations. Further, the database schema becomes increasingly hard to understand because how the information is structured is really not that intuitive.
Some people believe that you can convert SQL and SPARQL queries to XQuery. I don't totally grasp this presentation.
What does this mean?
People don't tend to realize that trying to make a relational database do things that it was not really built do do causes complexity which makes things harder to understand, harder to create, and harder to maintain.
Now again, technology people can hide some, or perhaps even all, of this complexity from business users. But when they have to deal with maintenance issues (adding dimensions, maintain schemas, writing queries, etc.), spend time to write code to automate tasks and make queries easier, and otherwise give business users what they need; that comes at a cost and it takes time.
People tend to agree that relational databases are a very mature, tested, stable, well understood, popular, robust, sophisticated tools. There are a lot of people who understand how to administer relational databases, develop relational databases, maintain relational databases, etc. There are a lot of incredibly useful features which relational databases have such as fault tolerance, commit-rollback, replication, etc. However, relational databases do have their weaknesses. No tool can do everything.
People tend to agree that business users LOVE Excel. Excel is an incredibly useful tool. I am not saying electronic spreadsheets, I am saying Microsoft Excel electronic spreadsheets. Until someone comes up with something better, Excel is the king of the hill. Lotus 1-2-3 WAS the king of the hill at one point, Microsoft changed that; someone else COULD dethrone Microsoft but they have not done so yet.
Excel and spreadsheets in general do have their problems. Spreadsheets have issues. "Spreadsheet hell" or the disadvantages of spreadsheets are well understood.
It seems to me that if business users could keep their spreadsheets, if they could do powerful queries against all that "big data" inside and outside their organization, across any business system, the information would not have any errors in it (i.e. information quality is high) and the IT department could control and maintain everything cheaply and easily then life would be good.
Personally I have zero allegiance to Microsoft or Excel. I do have allegiance to good, useful software. I used VisiCalc, ditched that for Lotus 1-2-3, and then ditched that for Excel because Excel was better.
I am getting tired and the dog needs a walk, so I am going to call this good for now. Stay tuned for Part 2!
FORTUNE: One step backwards for corporate accountability
In his article One step backward for corporate accountability published by Fortune, Ethan Rouen reaches the following conclusion:
XBRL is broken, but that does not diminish its potential importance to markets and regulators. By voting to eliminate reporting requirements for a majority of publicly traded firms, Congress is doing a disservice to investors and signaling to companies that make questionable accounting decisions that they have a sympathetic ear in Washington.
What is needed for XBRL now is not a step back to reflect, but a sprint forward to increase oversight and improve its usability.
I agree with Ethan that a "sprint forward" is the way to go, particularly since 95% of all SEC XBRL financial filings are less than 5 errors away from having fundamentally useful information.
I don't agree with the statement "Extensions damage XBRL's usability." As I have pointed out before, it is not extensions which cause problems, it is inappropriate extensions which cause problems. Extensions are actually a good thing, if used appropriately.
One interesting thing is that Ethan pointed out the same flaw in EXXON's financial report relating to the extension of revenues that I pointed out.
I find this statement quite interesting: (emphasis is mine)
In research I conducted with 28msec, a company that has developed a platform to extract information from XBRL filings, I found evidence that companies may be learning how to use extensions to effectively hide bad results.
I look forward to seeing that research. I have been able to find accounting anomalies, but I never looked for a correlation between those accounting anomalies and extensions or poor earnings results.
And oh, by the way...I have used the 28msec platform for analyzing SEC XBRL financial filings. It is quite nice and should be broadly available very soon. It will put a lot of analysis power into the hands of the average accountant.
Stay tuned for more information.




Understanding SQL-type Queries as Compared to XQuery-type Queries
I have been trying to understand the specific differences between:
- a relational-type query using SQL
- an XQuery-type query
- a SPARQL-type query
Why? Every tool has strengths and weaknesses. It blows my mind how many people love Microsoft Excel but have never even tried Microsoft Access. Excel and Access are two different tools, each has pros and cons. Many Excel users who only know Excel think that Excel is the be-all, end-all solution for everything. It is not. Access is a vastly superior tool for many things, vastly out performing Excel. As is said, "If the only tool you have in your toolbox is a hammer, then everything is a nail."
Information has different forms and databases that store that information likewise can take many forms. Relational databases is one very popular data storage system. SQL is one of the most useful things that I have ever used.
Another useful thing is XML.
XML and relational databases are different. As I understand it there are three primary schemes/models/representations for formatting and processing structured (semi-structured, highly-structured) information:
- Table-type (homogeneous, tabular, consistent): relational databases, CSV, spreadsheets, or tabular-type representations which allow only one level of hierarchy within each table; but hierarchies can be constructed by relating tables
- Tree-type (heterogeneous, arborescent): XML, XBRL (using tuples), JSON and other tree-hierarchy-type information which allow for the expression of one hierarchy
- Open-type or Graph-type (heterogeneous, arborescent): RDF, EAV, XBRL (using dimensions) and other open schema-type or graph-type representations which are more graph-oriented and allow for dynamically creating virtually any number of hierarchies; supports the notion of cycles; very flexible
Now, not surprising, there are three primary query languages which match the type of information representation scheme: SQL, XQuery, and SPARQL.
I don't want to get into SPARQL right now. I want to understand the difference between SQL and XQuery.
I stumbled across this paper: PERFORMANCE ANALYSIS OF XQUERY VS. SQL. The abstract of that paper says the following: (emphasis is mine)
Early in the XML history there were thoughts about whether XML is sufficiently different from other data formats to require a query language of its own, since SQL was already a very well established standard for retrieving information from relational databases.
But there were some differences that justified a new query language for XML data. Relational data is 'flat.' This means it is organized in the form of a two dimensional array of rows and columns. XML data is 'nested' and its depth of nesting can be irregular. On the one hand, relational databases can represent nested data structures by using tables with foreign keys, but it is still difficult to search these structures for objects at an unknown depth of nesting. On the other hand, in XML it is very natural to search for objects with an unknown position in a document. There are other differences, which also convinced the W3C workgroup to design a new XML Query language with a more efficient semantic definition rather than extending a relational language. It would be useful to focus research on programming an Analysis tool between nested databases and relational databases to show performance and scalability with respect to data volume and complexity for both types of queries.
The "Group By" clause in SQL is not easily replaced by features of XQuery. The thesis shows ways to achieve similar functionality and demonstrates that the current XQuery implementations are either non-compliant to the XQuery standard or much slower than SQL when performing "Group By" equivalent requests.
The direct result of this thesis was an Open Source tool, PerfanX, and the performance statistics gathered from a variety of queries run under that tool. The tool uncovered several counter-intuitive performance implications. The study also exposed several bugs in XQuery implementations. The author has reported these to the development groups.
Now, that article was written in 2004 when only XQuery 1.0 existed. Subsequent to this article, grouping was added to XQuery as part of XSLT 2.0. Now we are on XQuery 3.0.
An obvious question might be, "wouldn't it be better if there were ONE query language which supported all three information representation formats?" Well, others had that question. See Translating SPARQL and SQL to XQuery. I don't understand the paper, but I did speak to some people about this and they say that XQuery can represent anything that SQL and SPARQL can represent.
So where am I on this? It seems to me that a relational database can represent anything, but both relational databases and SQL were designed for tabular-type information, rows and columns. Tree-type XML and open-type XML is far richer in its ability to express information because of the ability to express hierarchies. So while it is possible to get SQL to do what XQuery was designed to do, query such hierarchies; the SQL and the databases get increasingly complex as the information becomes richer.
Bottom line: XQuery does a better job querying tree-type information and because of the richness of the information there is a lot more that you can do with the information because of (a) its inherent richness and (b) the features of XQuery to work with hierarchical information. Relational databases and SQL are far more mature than XQuery and have other features which XQuery does not have. The last sentence in that paper mentioned above by the author is this:
XQuery provides a powerful way to query this information and shouldn’t be underestimated.
I don't know that this is an either/or type question. Maybe it is, maybe not. Another outstanding question is performance. I want to personally see the performance.
Another outstanding question is the specific differences between SPARQL and XQuery. Stay tuned!




Sign of Things to Come: XBRL-US Releases Pension Disclosure Analyzer
XBRL-US announced a new product yesterday, the US GAAP Pension Analyzer. This tool, made available for free to public company reporting managers, is announced in this press release, XBRL US GAAP Pension Analyzer Identifies Significant Errors in XBRL SEC Filings for Q1 2014.
This tool is a sign of things to come. This is what I mean.
First, the pension disclosure(s) is only one of many disclosures made within a financial report. Here is a prototype list of disclosures required by US GAAP. Why couldn't disclosure analyzers be created for every disclosure? If you read the press release, it describes the US GAAP Pension Analyzer as performing the following:
The rules check each concept and combination of concepts to identify incorrect sign (negative or positive), invalid axis/member and axis/line item combinations, problems in roll forwards, missing values and incorrect dates, among other potential problems.
Second, while XBRL US started at one end of the problem, I started at the other end. What I mean is XBRL US has their consistency suite which checks an SEC XBRL financial filing. Now they have their US GAAP Pension Analyzer. Is that everything necessary to validate an SEC XBRL financial filing to be sure that filing is correct? Certainly not and they are not holding these validation steps out as being comprehensive.
Now I started on the other end. I have my seven minimum criteria which I use to verify the usability, and therefore the fundamental correctness, of an SEC XBRL financial filing. Are my criteria sufficient to be sure that everything about an SEC XBRL financial filing is correct? No, I make it clear that they are likewise not comprehensive. However, meeting each of my criteria are necessary for working with any reported information. More comprehensive business rules (criteria) are necessary to be sure the entire report is created correctly.
You need both what I created, which is a framework and high-level criteria, plus what XBRL US created which is more lower-level detailed criteria, plus you need those "analyzers" for all the other disclosures which might exist in an SEC XBRL financial filing. It is the combination of all these things which is significantly closer to what is truly necessary to get all the details of these digital financial reports correct so that the information within the reports will be usable. You likely need even more.
Third, I cannot speak to precisely what the US GAAP Pension Analizer does, but I can safely say that it goes far beyond XBRL technical syntax validation. Go back and read the quote above. "Problems in roll forwards" has nothing to do with syntax, that is semantics (meaning). "Incorrect dates", that is meaning. "Invalid axis/member combinations", meaning. There is a lot of US GAAP related stuff which can be verified because of the structured nature of the information reported in an SEC XBRL financial filing. That has tremendous value to external reporting managers. Detecting accounting anomalies will start to show the power of XBRL-based digital financial reports over traditional financial reports. When this happens, external reporting managers of not only public companies will value XBRL-based digital financial reports. Private companies will also, whether they press the "save as XBRL" button in the application or not.
Fourth, back in the 1970's the way automobiles and other things were made change dramatically because of two things: a guy named Deming and Toyota. Toyota adopted Deming's principles, I learned them as "World Class Manufacturing Techniques", and changed the auto industry forever. The old way was: build, find mistakes, fix mistakes. The new way was: fix processes, build, (i.e. don't make mistakes).
The point is this: why would you want to build your financial report incorrectly and then validate that report after-the-fact and correct the report for errors detected? Why not use the business rules to not let you create the mistake in the first place?
It is the integration of the sorts of validation offered by things like the US GAAP Pension Analyzer into the financial report creation workflow which will change accounting work practices. Just as a CAD (computer aided design) program understands the relation between a window and a wall; a digital financial reporting software application will understand the relation between assets, liabilities and equity, and a balance sheet. Does your current financial report creation software understand these sorts of relations?
That is why the US GAAP Pension Analyzer is a sign of things to come.




Digital financial reporting harnesses computers for speed, accuracy
TechTarget published an article, Digital financial reporting harnesses computers for speed, accuracy.



