Business Intelligence Architect
“If you can measure that of which you speak and can express it by a number, you know something of your subject; but if you cannot measure it, your knowledge is meager and unsatisfactory."
This quote from the 1800s attributed to William Thomson, 1st Baron Kelvin, is often paraphrased as a much catchier phrase, which still holds true in modern business:
"If you can't measure it, you can't manage it.”
Businesses measure all sorts of things in order to better manage their business. They measure how much a customer has spent on their products, what products have the best margin and how long it takes to get a product delivered to the customer after the order is confirmed. You name it, and most assuredly someone or some business out there is measuring it.
But measurements for a business are only as good as the delivery of this information to the people who manage what is being measured. This is where business reports come into play. Reports offer consolidation of the items measured. Generally speaking, the higher up in an organization a manager is, the more consolidated his or her reports need to be.
Computerized database technologies have been developed over the last few decades to allow businesses to store their data in highly efficient electronic formats, a.k.a. databases. Businesses rely on database systems geared toward capturing data from the users. These ‘front-end’ systems are often referred to as “On-Line Transactional Processing” systems (OLTP). Most transaction systems, in addition to their data entry and capture capabilities, also offer some amount of reporting. Even the most basic of reports can offer consolidation and some bit of context. For example, “How much did we sell last quarter as opposed to our projections?” But even the best and most complex of OLTP systems will have some reporting shortcomings for one of several reasons.
The concept of a “data warehouse” was developed to combat some of these issues. A data warehouse can be thought of as another database with high volume reporting and analytics as its main purpose, as opposed to row-by-row data retrieval and manipulation. They typically contain copies of data already managed by transaction systems, but they are designed and indexed for efficient bulk retrieval and reporting. In the following sections, we’ll explore some possible shortcomings of OLTP systems in regard to reporting and analysis and see how the implementation of a data warehouse can alleviate them.
Report Response Time
OLTP systems can be quite complex. For example, consider the complexity of the database structure needed to run an airline’s ticketing, passenger boarding and luggage tracking processes. And it’s this complexity and detail that are often their downfall when it comes to high level reports. A ticket agent may be able to tell you how many empty seats are on the next flight, but to find the number of empty seats on all the flights this year, compared to the total seating capacity (a measure of lost opportunities) may take several minutes or hours to compile.
A data warehouse may make use of aggregated data sets for faster response times. Summarizing the data by day, week or month will drastically reduce reporting times.
Often, aged data is purged from a front-end system. Without such periodic housecleaning, the volumes of data to plow through to get to a specific record may become prohibitive. In most cases, these historical data are archived before they are removed, and stored in other tables, databases or even on completely different systems. Reporting on historical data then becomes difficult at best.
Such periodic purging of data in a data warehouse is seldom required and often frowned upon. Trend reporting, for example by season or month of the year, is then more accurate if there are more years to average. Global climate databases look at seasonal temperature averages over millennia!
To pull up a single row out of millions in an OLTP system requires a specific indexing structure. Think of a typical white pages phone book that is indexed by town, then by last name, then by first name. How impossible would it be to use such an indexing structure to find all the people with the first name of John?
Data warehouses, on the other hand, employ multiple indexes to allow this type of data searching. The local library will most likely have at least two sets of card catalogs: one by author and one by subject. Each set of card catalogs can be thought of as an index to the books on the shelf.
Non-Intuitive and Complex Database Structure
Data warehouses are typically a stripped-down version of their OLTP counterpart. Very seldom is everything in the source system a candidate to be tracked in the data warehouse. For this reason, data warehouses usually have fewer tables and fewer fields or columns in those tables. In addition, multiple tables related to one item in an OLTP system can often be represented by a single table in a data warehouse. Having a single table for customers and a single table for products, instead of several or dozens for each, make the report designer’s job much easier.
No Insight from Other Sources
OLTP systems are good at what they do, but sometimes they don’t do enough to generate truly meaningful reports. A system that tracks and monitors manufacturing plant performance may have a lot of data points it collects, but the price of the raw materials may not be one of them.
A data warehouse, on the other hand, can collect price fluctuations from some other external sources, such as the supply chain, and lay it alongside the performance data for a more representative picture of the plant’s cost effectiveness.
Historical Changes are Lost
Most OLTP systems do not store historical changes to base data (base data being the nouns of the business: who and what, as opposed to the how much). A customer may move from one state of residence to the next, and all the OLTP system records are where that customer currently resides. All transactions for that customer are then associated with where the customer lives now, not where they lived at the time of the transaction.
To solve this issue, a data warehouse will, in the preceding example, have two rows to represent that one customer, one row for before they moved, and one for their new place of residence. This allows the data warehouse to accurately report truly historical data because the transaction can be thought of as staying in the location at which it occurred, instead of transferring to the new location.
On-Line Transaction Processing systems have become almost ubiquitous in today’s business world, though most business users will call them by their varied commercial names. But business managers often experience frustration with these systems because they are slow to report large volumes of data, lack data past a few months or years, are difficult to understand for customized reports, lack certain pieces, are simply inaccurate regarding historical changes or any combination thereof.
The solution is a properly designed data warehouse built for speed, comprehensiveness, completeness, and, in a word, true business analytics. In future installments on the data warehouse theme, we’ll dive into some of the details of how a data warehouse addresses some of the issues above.