Spend Analysis VI: New Horizons (Part 2)

Today I'd like to welcome back Eric Strovink of BIQ who, as I indicated in part I of this series, is authoring the first part of this series on next generation spend analysis and why it is more than just basic spend visibility. Much, much more!

Federation

One of the most serious limitations of OLAP analysis is the schema structure itself -- typically a "star" schema, where a voluminous "fact" or "transaction" file is surrounded by supporting files, or "dimensions." In the case of spend analysis, dimensions are Supplier, Cost Center, Commodity, and so on; transactions are typically AP records.

Why is this schema limiting? Because there are only certain ways that a dimension file can be linked to transaction files, and it isn't always clear which file ought to be the transaction file and which files ought to be dimensions. For example, suppose that the transaction file consists of AP transactions, and a dimension file consists of invoice line items. The problem is that the invoice line item file is "moving faster" than the AP file; i.e. for every invoice number that appears in AP, there are multiple invoice lines that match. Which invoice line item should we link to?

Well, we could invert the problem and build the dataset from the invoice detail file instead, except that we typically won't have invoice detail for every AP record, so that probably won't work. Here's a couple of ideas that will work: (1) we could build a separate measure column for invoice line items, and include them as AP record equivalents (coercing the two record types into a common format); (2) we could drop the associated AP record whenever we have invoice line item data, and include the AP information inside those line items, redundantly.

There are other options, too.

But the essential problem is that we have two separate datasets, and we're trying to join them at the hip. There is an AP dataset, and there is an invoice line item dataset, and never the twain shall meet, except artificially. Even when there is no granularity issue at all, and when one dataset can be normalized or snowflaked such that every matching line item can be joined through from the other, the amount of effort required to set up the index->index->index relationships can be daunting.

Instead, why not create two separate datasets, efficiently and quickly; and then as a final step, federate them together on a common dimension? Suppose the federation logic was "join" -- in that case, we'd drill on an element in dataset A; dataset B would drill on the common dimension from A; and then A would drill again on the common dimension from B. What we'd see is the perfect join of all of the records from A and from B that shared a common key in the common dimension; and we'd have the ability to reference all data from any dimension of both A and B.

There are many forms of federation in addition to join -- for example, "master-slave," where we drill on A, and B shows us its common nodes; but does not feed those back to A. That relationship can go the other way, as well, from B to A. In addition, there's a "disjoint" operation -- show me all the nodes in B that don't share a key in the common dimension with A (and vice versa).

Federation represents a key productivity enhancer for dataset creation, as well as a simplification to the dataset building process in general. Federation also passes the "usability" litmus test, in that the resulting datasets are much easier to understand than massive levels of index indirection and snowflaking, and have the potential to produce richer results.

The technical challenges for federation are considerable: maintaining multiple connections to multiple datasets; representing multiple data dimensions inside the context of a single data viewer; providing mechanisms for pulling data seamlessly from multiple datasets for reports and analyses; and last but not least, augmenting the OLAP engine to perform federation operations effectively and quickly.

Is federation worth it? I think so, emphatically.

This brings to an end our initial Spend Analysis series. Thanks for the opportunity, Michael; and thanks to everyone who took the time to wade through it.

As Eric said, this ends Sourcing Innovation's initial series on spend analysis. I'd like to thank Eric for his enlightening posts and hope that you learned something from them.

 

What did you think of this article?




Trackbacks
  • Trackbacks are closed for this entry.
Comments

  • 2/6/2007 11:00 AM Mark Usher wrote:
    Great series of articles, Eric. So am I right in reading that "federation" would allow separate datasets from multiple source systems (AP, e-procurement, p-card, etc.) to be linked by common dimensions and analyzed in a single view? A common one I run into is having, say, $10B of spend from AP with no line-item detail (but with some PO numbers) and then $2B of spend from the PO system with line item data. So federation lets me drill down in a single view to look at the whole $10B including the $2B with line item detail? And ditto for any other portions of the $10B where line item data is available from some other system, e.g. inventory system?
  • 2/6/2007 1:10 PM Eric Strovink wrote:
    Right. You create each of the independent cubes; then federate them on a common dimension; and then you have a coherent view across all of them (and you can draw data from any of them) per your overall drill position and per the federation logic between them.

    This should be significantly easier to understand from an end-user perspective than having to deal with (on the cube-building side) granularity issues, complex index linkages, and so on. One also ends up with separate cubes, which are likely themselves to be useful.

    Note that it is certainly possible to build useful index linkages and work around granularity issues without federation; but not nearly as easily or elegantly.
Leave a comment

Comments are closed.