US State Population Trends Added to State Fact Book

I added an additional data set to the State Fact Book prototype I am creating. That data set, which is Set-08, has population trends for the US states and the District of Columbia between 2000 and 2009.

Here is a walk through of how this was done. Hopefully this will give you an appreciation for some of the things XBRL (and other XML syntaxes) can provide. If you take the time to walk through this, you might be able to see some value which XBRL brings to the table.

I am using data from this US Census web site which provides population estimate information. I am using the "National population dataset". That data set is explained by this PDF file (File Layout). This is the actual data file (CSV File).

The first thing to note is that the information which describes the data (i.e. the PDF file) is not reference by or connected to the data itself. When people say XML is "self describing", part of what that means is that the XML both contains the data and describes the data. XBRL does this also. The XBRL instance is the data, the XBRL taxonomy describes the data.

Another think to notice about the data is that it is "flat". CSV shows rows and columns. You cannot really model complex relations. If you wanted to relate this data file to another data file, you can do it, but there is no way to validate that you got the relations correct, unless you build tools to check the relations between two different CSV files. XML and XBRL can connect the data and the information which describes the data (which is referred to as metadata).

I grabbed the CSV, got it into Microsoft Excel which is very easy, then got the Excel into Microsoft Access, again very easy. I put the information into Access because Access is vastly easier to use to do some things I need to do. What, Access easier than Excel? Yes, I need the relational database functionality of Access. You can do this in Excel, but again, you have to build your own relations. Why do that when Access can do it for you better.

I used Access to generate XHTML, XML, and XBRLversions of this data. Now, all these formats are "XML". They are just different syntaxes of XML. The each has semantics expressed, but each expresses the semantics in different ways, sometimes explicitly, sometimes implicitly.

One piece of the semantics which is not explicitly expressed in the CSV or XML or HTML is that the numbers are supposed to add up. Each state plus the District of Columbia adds up to the total US population. XBRL can express this. In this case, the population trends XBRL taxonomy information which consists of two things, a definition linkbase and a formula linkbase expresses this information. Those two pieces, combined with the general XBRL taxonomies used by the state fact book information gives me this report when run through an XBRL processor which supports XBRL formula. This report shows that the information adds up correctly. You can run the XBRL through any XBRL application and you will get the same results because XBRL, a global standard, specifies the rules of how XBRL works, and different software vendors implement those rules.

Validating that the numbers add up does two things as explained in this blog post. First, it documents what should add up. Second, it proves that it does add up. A lot of people first don't realize how big a deal this is. They think, "Well, I do validation in my application to check to see if things add up." You have to build that validation and you cannot exchange it with anyone else, because it is proprietary. With XBRL the validation rules can be exchanged and they are rules engine based and can be used many times, not just once in your system.

A final point I want to make is that the population trends information, the general information, and the financial information can all easily be hooked together. Those three data sets use the same base metadata which is expressed in the XBRL taxonomy each uses. Grab the XBRL from the index page (the blue image which says "XBRL" on it). Try the following:

Hook the financial information and population information together.
Separate the population information by "red", "blue" and "purple" state.
Create your own data set and connect it to one of the data sets from above.

You can actually do this with XML, XBRL, or RDF. Each syntax has its pros and cons. I will be discussing those pros and cons in future blog posts.

Posted on Thursday, May 6, 2010 at 08:40AM by

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

All HTML will be escaped. Hyperlinks will be created for URLs automatically.