Blog: Digital Financial Reporting (using XBRL) - XBRL-based structured digital financial reporting

BLOG: Digital Financial Reporting

This is a blog for information relating to digital financial reporting. This blog is basically my "lab notebook" for experimenting and learning about XBRL-based digital financial reporting. This is my brain storming platform. This is where I think out loud (i.e. publicly) about digital financial reporting. This information is for innovators and early adopters who are ushering in a new era of accounting, reporting, auditing, and analysis in a digital environment.

Much of the information contained in this blog is synthasized, summarized, condensed, better organized and articulated in my book XBRL for Dummies and in the chapters of Intelligent XBRL-based Digital Financial Reporting. If you have any questions, feel free to contact me.

Entries from March 30, 2014 - April 5, 2014

Understanding Syntax

Syntax, or technical syntax, is the physical format of the information. There are presentation formats and there are representation formats. These are some of the more common file formats (which are technical syntaxes):

CSV (Comma Separated Values): Example
Fixed Width: Example
DIF (Data Interchange Format): Example
PRN (Printer Information): Example
ASCII Text: Example
Plain Text: Example
Excel (Binary): Example
Excel XML: Example
Word (Binary): Example
PDF: Example
RTF (Rich Text Format): Example
HTML: Example
HTML (view text): Example
XML (Traditional): Example
XBRL: Example
RDF/OWL (Draft, needs work): Example
JSON: Example

Each of those examples expresses the same information in a different way, using a different file format or technical syntax. Another way of saying this is that each file format has a different syntax but the information expressed is identical. There are lots of other file formats (see http://www.fileinfo.com/filetypes/data). Imagine having to write software to parse and use these different data formats. That is why global standard file formats are popular.

Looking at CSV you can learn some things about syntax. CSV (comma-separated values) is a simple ASCII file format that is widely used by business and scientific applications to exchange tabular data.

First, CSV is not a formal global standard format. CSV can be called a best practice or an ad hoc or informal standard. However, CSV is widely used, widely supported, and there are even a number of informal specifications. Excel supports CSV import and export; that is a specification in that if it works, it is right and if it does not it is wrong. IETF (Internet Engineering Task Force) created a specification, RFC-4180. CSV-1203 is another. Super CSV is another. Some guy in New Jersey created another.

The CSV-1203 best practice standard describes the problem they are trying to solve as follows: (emphasis is mine)

By adopting this standard, you place a prudent limit on the otherwise countless variations that could be implemented by systems developers. Its benefits are clearest when a CSV file forms an information bridge between two companies. Typically you should expect this standard to help reduce the time it takes to establish a data processing connection between your company and your clients or your service providers.

Business people tend to be very practical. Easier is better than harder. But for something to be useful to business people it has to work reliably, predictably, repeatedly.

But CSV "has problems" or limitations. It is more like trying to put a square peg into a round hole.

First off, there are three primary types (schemes/models/representations) for formatting and processing structured information (semi-structured, highly-structured):

Table-type (homogeneous, tabular, consistent): relational databases, CSV, spreadsheets, or tabular-type representations which allow only one level of hierarchy within each table; but hierarchies can be constructed by relating tables
Tree-type(heterogeneous, arborescent): XML, some XBRL application profiles, JSON and other tree-hierarchy-type information which allow for the expression of one hierarchy
Graph-type or Open-type (heterogeneous, arborescent): RDF, EAV , some XBRL application profiles, and other open schema-type or graph-type representations which are more graph-oriented and allow for dynamically creating virtually any number of hierarchies; very flexible

Basically, CSV was not designed to represent tree-type information, it was designed to represent table-type or tabular information. I am not going to go into explaining the differences between table, tree, and graph type data here, perhaps in another blog post.

Another limitation of the CSV syntax is that CSV does not define data types. By contrast, XML Schema defines data types (check out section 3 Built-in Data types) that can be used within some XML file. XML Schema data types are fast becoming the global standard data types, it seems.

Another limitation of the CSV syntax is that CSV does not really have much of a schema. Again, a relational database provides the ability to define a schema or the structure of and relations between tables in that relational database. XML Schema lets you define the structure that is allowed within an XML file.

Another limitation of the CSV syntax is that CSV was designed to transfer data. There is a difference between data, information, knowledge, and wisdom. This is one explanation of the difference:

by Gene Bellinger, Durval Castro, Anthony Mills

There is a subtle, but very significant, difference here. CSV stated that the objective is exchanging data between business and scientific applications. But the real goal is not the exchange of the data, but making use of the data: the real goal is business system interoperability.

So another way to cast this is that different technical syntaxes have differing powers of expressing semantics, or meaning, of the relations, patterns, and principles which are contained in the information. There are two reasons why you want to express this type of information: (1) verifying/validating that the information is expressed correctly, these are called business rules and (2) communicating to others as to the important relations, patterns, and principles represented within the information. The more business rules you have, the stronger the semantics. The stronger the semantics, the more you can do with the information.

Another point which can be made using CSV is the difference between a "presentation format" and a "representation format". CSV actually performs both presentation and representation functions well. It is easy for humans to read because it is tabular, but it is also easy for machines to consume because it is tabular. This is a good explanation of the difference between representation and presentation:

Presentation is just showing. Representation is showing with another meaning behind it. Representation is indicating something below the surface of what you’re presenting.

That is not the greatest explanation of the difference between presentation and representation, it will act as a placeholder for now.

I don't have this exactly where I want this but I am going to call this good and tune it up later.

Posted on Sunday, March 30, 2014 at 01:54PM by

Charlie in Becoming an XBRL Master Craftsman |

Understanding Classification Systems

(This blog post relates to trying to sort out two very good graphs (graphic 1, graphic 2) for the purpose of creating a better graphic which includes XBRL and other things related to XBRL. The first step is to sort out all the terms used in the diagrams. This blog post focuses on classification systems.)

A classification system is a grouping of something based on some criteria. There are many different types of classification systems. David Wenberger's book Everything Is Miscellaneous points out two important things to remember about classification systems:

That every classification scheme ever devised inherently reflects the biases of those that constructed the classification system.
The role metadata plays in allowing you to create your own custom classification system so you can have the view of something that you want.

Everything is Miscellaneous also describes the three "orders of order" of classification systems:

First order of order. Putting books on shelves is an example the first order of order.
Second order of order. Creating a list of books on the shelves you have is an example of second order of order. This can be done on paper or it can be done in a database.
Third order of order. Adding even more information to information is an example of third order of order. Using the book example, classifying books by genre, best sellers, featured books, bargin books, books which one of your friends has read; basically there are countless ways to organize something.

The following is a summary of terminology used to describe different types of classification systems on the two grapics pointed out above plus other items which seem to belong in the list of classification systems:

List: A set of items or things.
Dictionary: A dictionary is much like a list, a dictionary had no hierarchy.
Glossary: A glossary contains explanations of concepts relevant to a certain field of study or action. In this sense, the term is related to the notion of ontology.
Thesaurus: Lists grouped together according to similarity of meaning.
Controlled vocabulary: Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other forms of knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary.
Taxonomy: A taxonomy is a classification system which does have a hierarchy, but the hierarchy tends to be less formal.
Folksonomy: A folksonomy is a system of classification derived from the practice and method of collaboratively creating and translating tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging.
Ontology: An ontology is a set of well-defined concepts which describes a specific domain. Ontologies tend to be more formal, more complete, and more precise classification systems. The goal of an ontology is to provide a formal, machine readable, referancable set of concepts which are used in communications within a domain which precisely describes the domain. An ontology is also expressed as a hierarchy, but the hierarchy is more explicit and much richer in meaning than a taxonomy.

These things appear to be modeling systems which were somewhat intermingled with classification systems; they seem to be approaches to representing a classification system:

Entity-relationship diagram (ER model): An entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database. The main components of ER models are entities (things) and the relationships that can exist among them.
Conceptual model: A conceptual model is a model made of the composition of concepts, that thus exists only in the mind. Conceptual models are used to help us know, understand, or simulate the subject matter they represent.
Concept map: A concept map is a diagram that depicts suggested relationships between concepts.
Topic map: A topic map is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information.
UML (Unified Modeling Language): The Unified Modeling Language (UML) is a general-purpose modeling language in the field of software engineering to model systems. The basic level provides a set of graphic notation techniques to create visual models of object-oriented software-intensive systems. Higher levels cover process-oriented views of a system.
XMI (XML Metadata Interchange): XML standard for exchanging metadata information.

All this stuff seems to fit into these general notions: (the big picture)

Network theory: Network theory concerns itself with the study of graphs as a representation of relations between objects.
Graph theory: Graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects.
Theory of relations: A relation in mathematics is defined as an object that has its existence as such within a definite context or setting.

Posted on Sunday, March 30, 2014 at 10:53AM by

Charlie in Becoming an XBRL Master Craftsman |