Spend Analysis: Another Book Review ... And This One's NOT Positive!

Pandit and Marmanis recently published a book titled Spend Analysis: The Window into Strategic Sourcing that has received a fair amount of praise from prince and pauper alike. Since I am currently in the process of co-authoring a text on the subject (now that my first book, The e-Sourcing Handbook [free e-book version], is almost ready to go to press), I figured that I should do proper due diligence, obtain their book, and read it cover to cover. I did - and I was disappointed.

Although the book would have been interesting ten years ago, good seven years ago, and still have some relevance five years ago, today it adds very little insight. In fact, the book is filled with fallacies, incorrect definitions, and poor advice.

Problems start to surface as early as the third paragraph (page 5) where the authors attempt to 'simplify' the definition of Spend Analysis, stating that "spend analysis is a process of systematically analyzing the historical spend (purchasing) data of an organization in order to answer the following types of questions". There are at least three problems with this 'simplification':

  • Spend analysis is NOT systematic. Sure, each analysis starts out the same ... build a cube ... run some basic reports to analyze spend distribution by common dimensions ... dive in. However, after this point, each analysis diverges. Good analysts chase the data, look for anomalies, and try to identify patterns that haven't been identified before. If a pattern isn't known, it can't be systematized. Every category sourcing expert will tell you that real savings levers vary by commodity, by vendor, by contract, and by procuring organization -- to name just a few parameters.
  • Good spend analysis analyzes more than A/P spend. It also analyzes demand, costs, related metrics, contracts, invoices, and any other data that could lead to a cost saving opportunity.
  • The questions the authors provide are narrow, focused, and only cover low hanging fruit opportunities. You don't know a priori where savings are going to come from, and no static list of questions will ever permit you to identify more than a small fraction of opportunities.

From here, problems quickly multiply. But I'm going to jump ahead to the middle of the book (page 101) where the authors (finally) present their thesis to us, which they summarize as follows:

A complete and successful spend analysis implementation requires four modules:
  • DDL : Data Definition and Loading
  • DE : Data Enrichment
  • KB : Knowledge Base
  • SA : Spend Analytics

Huh? I don't know about you, but I always thought that spend analysis was about, well, THE ANALYSIS! A colleague of mine likes to say, when aggravated, "it's the analysis, stupid". And I agree. A machine can only be programmed to detect previously identified opportunities. And guess what? Once you've identified and fixed a problem, it's taken care of. Unless your personnel are incompetent, the same problem isn't going to crop up again next month ... and if it does, you need a pink slip, not a spend analysis package. DDL? Okay - you need to load the data into the tool - but if you don't know what you're loading, or you can't come up with coherent spend data from your ERP system, you have a different problem entirely (again, you're in pink slip territory). Enrichment? It's nice - and can often help you identify additional opportunities, but if you can't analyze the data you already have, you have problems that spend analysis alone isn't going to solve. Knowledge base? Are the authors trying to claim that the process of opportunity assessment can be fully automated, and that sourcing consultants and commodity managers should pack their bags and head for the hills? Last time I checked, sourcing consultants and commodity managers seem to have no difficulty finding work.

So let's focus on the analysis. According to the authors,

an application that addresses the SA stage of spend analysis must be able to perform the following functions:
  • web-centric application with Admin & restricted access privileges
  • specialized visualization tools
  • reporting and dashboards
  • feedback submission for suggesting changes to dimensional hierarchy
  • feedback submission for suggesting changes to classification
  • immediate update of new data
  • 'what-if' analysis capability

I guess I'll just take these one-by-one.

  • Web-centric? If the authors meant that users should be able to share data over the web, then I'd give them this one ... but the rest of the book strongly implies that they are referring to their preferred model, which is web-based access to a central cube. I'm sorry, that is not analysis. That is simply viewing standardized reports on a central, inflexible warehouse. We'll get back to this point later.
  • They got this one right. However, the most specialized "visualization tool" they discuss in their book is a first generation tree-map ... so maybe it was just luck they got this one right.
  • Reporting is a definite must - as long as it includes ad-hoc and user-driven analyses and models. Dashboards? How many times do I have to repeat that today's dashboards are dangerous and dysfunctional.
  • Feedback submission for suggesting changes? There's a big "oops!" Where's the analysis if you can't adjust the data organization yourself, right now, in real time? And if you have to give "feedback" which goes to a "committee" where everyone else has to agree on the change, which typically negates or generalizes the desired change - guess what? That's right! The change never actually happens, or if it does happen, the delay has caused it to become irrelevant.
  • Feedback submission for suggesting fixes to the data? How can you do a valid analysis if you can't fix classification errors, on the fly, in real time?
  • If the authors meant immediate update of new data as soon as it was available, then I'd give them this one. But it seems that what they really mean is that "the analysis cube should be updated as soon as the underlying data warehouse is updated", but considering that they state on page 182, "in our opinion, there is no need for a frequent update of the cube" (note the singular case, which I'll return to later), and then go on to state that quarterly warehouse updates are usually sufficient, I can't give them this one either.
  • I agree that what-if analysis capability is a must - but how can you do "what if" analysis if you can't change the data organization or the data classification, or even build an entirely new cube, on the fly?

The authors then dive into the required capabilities of the analytics module, which, in their view, should be:

  • OLAP tool capable of answering questions with respect to several, if not all, of the dimensions of your data
  • a reporting tool that allows for the creation of reports in various formats; cross-tabulation is very important in the context of reporting
  • search enabled interface

Which, at first glance, seems to be on the mark -- except for the fact that the authors' world-view does not include real-time dimension and data re-classification, which means that any cross-tabs that are not supported by the current data organization of the warehouse are impossible. Furthermore, it's not the format of the reports that matter, but the data the user can include in them. Users should be able to create and populate any model they need, whether it's cross-tabular or not. Finally, we're talking about spend analysis, not a web search engine. Search is important in any good BI tool, but if it's one of the three fundamental properties that is supposed to make the tool 'unique', I'm afraid that's a pretty ordinary tool indeed.

The authors apparently don't understand that spend analysis is separate from, and does not need to be based on, a data warehouse. Specifically, they state (on page 12) that "data warehousing involves pulling periodic transactional data into a dedicated database and running analytical reports on that database ... it seems logical to think that this approach can be used effectively to capture and analyze purchasing data ... indeed ... using this approach is possible".

It's possible to build a warehouse, but it's not a good idea for spend analysis. The goal of warehousing is to centralize and normalize all of the data in your organization in one, and only one, common format that is supposed to be useful to everyone. Unfortunately, and this is the dirty little secret with data warehouses, this process ends up being useful to no one in the organization, which is why most analysts simply download raw transactions to their desktops for private analysis, and ignore the warehouse. But the authors don't stop there. In a later chapter, they go on to imply that the schema is very important and that selection of the target schema for spend analysis should be carefully chosen based on several considerations (page 177), namely:

  1. are your domains adequately represented?
  2. will your schema be evolving to support a centralized PIM system?
  3. is your company global? is internationalization an important requirement?
  4. is any taxonomy already implemented at a division level?
  5. has the schema been maintained in recent months?

To this, all I can say is:

  1. Doesn't matter. What matters is that the analyst has the data she needs for the analysis she is currently conducting.
  2. Who cares? There should be no link between your PIM and your SA system. PIM is just another potential data source to use, or ignore, as your analysts see fit.
  3. Whatever. If you have a good ETL tool, you can define a few rules to do language and currency mapping on import.
  4. Irrelevant. We're talking SA, not ERP.
  5. I would think it would have been, since the only way in the authors' worldview to change spend data representations is to change the underlying schema of the warehouse!

The authors cheerily state (on page 14) that "a good commodity schema is at the heart of a good spend analysis program because most opportunities are found by commodity managers". But hold on just a minute! If most of your opportunities are being found by your commodity managers using a basic A/P spend cube, then they're limiting themselves to very simple low hanging fruit - which is picked clean in the first few months in a typical organization that makes a commitment to spend analysis. That's why the traditional spend analysis value curve drops to almost zero within a year - meaning that if you don't recover the cost of the effort in the first three months, you'll never recover it. An A/P cube is just the beginning of the discovery process, not the endpoint.

The authors also make a strong argument for auto-classification, stating that (on page 100) "the reader must note that classifying millions of transactions is a task that should be done by using auto-classification and rules-based classification technology" and that "unless you license spend analysis applications, data scrubbing can be a very manual time consuming activity which requires a team of content specialists".

Actually, nothing about rules-based classification mandates that the rules must be built by a robot, and there are many reasons why that can be a bad idea (not the least of which is the fact that robots are far from infallible). Classification rules can be built easily and effectively by hand ... by a clerk ... even in a very large organization with many disparate business units. Once built, this set of rules can then be applied in a fully automated way to every new transaction added to the system. So let's not confuse "automation of creation" with "automation of application," please. Of course, you do need a good, modern, spend analysis tool that allows for the creation of rules groups of different priorities, and you need a rules creation mechanism that's easy to use and easy to understand.

Have you ever wondered why skilled consultants can build and map a spend cube to 90% accuracy very quickly? Well, here's one tried-and-true "manual" methodology that builds terrific "automated" rules:

  1. map the top 200 GL codes
  2. map the top 200 vendors
  3. map the GL code + Vendor for vendors who sell you more than one item, or items in more than one category, depending on the level of detail you need

If you want to, you can get to 95-97% accuracy by extending to the top 1000 GL codes and the top 1000 vendors -- if you really believe you are going to source 1000 vendors (and of course you're not). To check your work, you'll need to run reports that show you:

  • top GL's and top commodities by vendor
  • top vendors and top GL's by commodity
  • top vendors and top commodities by GL

Simply keep mapping until all three reports are consistent, and you are as accurate as you'll need to be -- and you'll have the advantage of having built your own mapping rules, that you understand. The alternative, which is error-checking the work of an automaton (a process that must be done, because no robot is perfect), is difficult, tedious, and error-prone -- and it must be repeated on every data refresh.

When the authors state (on page 116) that "manual editing is sufficient, but it is also extremely inefficient ... it is not scalable with respect to the size of the data", this is flatly untrue. The creation of dimensional mapping rules is wholly unrelated to the volume of the transactions -- the same effort is required for 1M transactions as is required for 100M, and most spend datasets can be mapped very effectively with dimensional rules only. The only exception is datasets whose only component is a text description; and here, too, the authors' "scalability" argument falls apart, since human-directed phrase mapping can divide-and-conquer quite effectively.

To top it all off, the authors go on to violate the first rule of spend analysis, which is "NEVER, EVER, EVER EXCLUDE DATA". They take great pains to classify all of the errors that can occur in the ETL process and then bluntly state that (on page 109) "if you have errors in category iv (root cause is undocumented and cannot be inferred), then you have two alternatives ... the first alternative, if possible, is to exclude these data from your sources ... errors of category iv are unacceptable and could jeopardize your entire analysis ... so they should be eliminated".

No, NO, NO! YOU MUST ACCEPT ALL OF THE RECORDS and YOU MUST DO SOMETHING SENSIBLE with the records that don't fit into your notion of reality. For example -- create a new Vendor ID, and family it automatically under Not Found, or Missing. Dropping data jeopardizes your analysis much more than creating an "Uncategorized" or "Missing" data node. What if errors represent 15% of your spend? Then you'd be reporting that you are spending 85M on a category when you are spending 100M. Your numbers won't add up ... and when the CFO files a SEC filing on data that is later found to be incomplete by the auditors, guess whose head is going to roll?

And before I forget, let's get back to that web-centric requirement where the authors imply that all of this means web-based access to a central cube (singular case). Throughout the entire book they refer to "the cube" (such as when they state that "in our opinion, there is no need for a frequent update of the cube") as if there's only ever one cube to be built. Turns out there isn't just one cube to be built -- there are dozens of cubes to be built. Some power analysts build 30 or 40 commodity-specific invoice-level cubes (what are those? you won't learn that from Pandit and Marmanis), and regularly maintain a dozen of these every month -- not every quarter (as the authors recommend).

The only real hint that the authors give that multiple cubes might be useful is where they state (on page 51) that "some companies are taking the approach of creating different cubes for different uses, rather than packing all possible information in a single cube for all users ... for example, all users might not be interested in auditing P-Card information ... rather than include all of the details related to P-Card transactions in the main cube, you can simply model the top-level info (supplier, cost center) in the main cube ... then ... create a separate 'microcube' that has all of the detailed transactional information ... the two cubes can be linked, and the audit team can be granted access to the microcube ... the microcube approach can be rolled out in a phased manner". Or, in short form, you can have multiple cubes if you have too much data, and the way you do it is to create ONE main cube, and then micro-cube drill-downs for relatively non-important data. I don't even know how to verbalize how wrong this is -- it completely inverts the value proposition. (Now, to be fair, they also state that "ideally, the cubes should be replicated on the user machine for the purposes of making any data changes", but they give no definition as to what form these cubes should take or what changes are to be permitted, so we are left assuming their previous definition, which is secondary micro-cubes and only minor, meaningless, alterations, since the dimensional and rule-based classifications require "approval").

At this point, you're probably asking yourself - did the authors get anything right? Sure they did! Specifically:

  • Chapter 4 on opportunity identification had a good list of opportunities to start with. Too bad most of them are the low-hanging fruit opportunities easily identified with out-of-the-box reporting and that there's no real insight on how to do serious untapped opportunity identification when there isn't a pre-canned report available.
  • Chapter 5 on the anatomy of spend transactions had a good overview of the formats used in various systems ... but if you're a real analyst, you probably know all this stuff anyway.
  • Chapter 7 on taxonomy considerations had good, direct, simple introductions to UNSPSC, eOTD, eCl@ss, and RUS. It's too bad these schemas are relatively useless when it comes to sourcing-based opportunity identification.
  • When the authors pointed out (on page 8) that there is still widespread low-adoption of spend analysis, they are correct ... but when they state that it's because we're talking tens or hundreds of millions of transactions, it's irrelevant and wrong. For any specific analysis, there's probably only a few million or tens of millions of transactions that are relevant, and a real spend analysis tool on today's desktops and laptops can operate on that number of transactions without issues. There is no need for a mainframe.
  • When they state that the categorization of errors is critical because not all errors are equally costly to fix, they're right ... but the data warehouse is irrelevant. Just add a new mapping rule and you're done. Two minutes, tops. What's the big deal? Oh, I forgot -- in the authors' world, you can't add a new mapping rule on the fly.

To sum up, when the authors state in their preface (on page xv) that "if implemented incorrectly, a spend analysis implementation program can become easily misdirected and fall short of delivering all of the potential savings", I wholeheartedly agree. Unfortunately, the authors themselves provide a road map for falling short.

 

What did you think of this article?




Trackbacks
  • Trackbacks are closed for this post.
Comments

  • 4/28/2008 11:26 PM Michael Mimo wrote:
    It is clear and most obvious that “the doctor” is most interested in promoting his owns patients care (he is writing his own book). First, you lower yourself to personal attacks on the authors. You do a fine job of showing us you adolescent mind.
    Second, on page 5 they are simply laying out the foundation. Of course they are well aware that pattern matching is key to spend analysis. Did you read the book? They use the next 11 chapters to prove this point. That was a cheap shot if I ever saw one.
    Third, they are referring to the data warehousing implementation, the setup and structure. You cannot just drop all the data into a database and expect miracles. It is clear to me your just a theorist, never actually built a spend analysis system.
    Fourth, you critique of the analysis section is weak. In fact you seem to be agreeing with most of what they say. yes, yes, YES!
    Fifth, using a data warehouse system is by far the preferred method. You do not even have the guts to provide an alternative.
    Sixth, if you know what you are doing you can exclude data.
    Seventh, you attack on the cubes is drawn out with no value. You have very little modeling experience from what I can tell. The book is an introduction into spend analysis not an ordeal into cube theory. Please give it a break.
    Last, it is clear form what you wrote that you only interest is to discredit their excellent book and promote you ego for your own book. Lets, just state one very true fact about their book that you purposely avoided to discuss. That fact is they have a plethora of use case examples where they show real world results. They must be doing something right after all regardless of you low opinion that counts for very little.
    1. 4/29/2008 11:36 AM the doctor wrote:

      Michael, I did not personally attack the authors in this blog entry. I do not know them, and I make it a point to do my best to avoid attacks of such a nature in this blog. I am discussing the subject matter they put forth in their book. If you see that as a personal attack on them, I'm sorry, but there's nothing I can do. I firmly believe that data warehousing and spend analysis are two VERY different things (and, by the way, I have lots of experience with the former), and that presenting them as one idea is doing the space a disservice.

      I never stated pattern matching is key to spend analysis. It's a required component of a good rules engine, but when I said "good analysts chase the data ... and try to identify patterns" I was referring to the fact that analysts find opportunities by locating patterns in the data that haven't been found before. Pattern matching can only be done on known patterns. So I can't agree with you here, because I'm referring to new pattern identification. The key to spend analysis is being able to map, classify, and reclassify the data in multiiple ways, not just one, so as to find those new patterns. (And yes, I read every single page. And more importantly, I understood what they were saying -- did you?)

      I'll wholeheartedly agree with you when you say that "you cannot just drop all of the data into a database and expect miracles". It doesn't happen. I'll also agree that the list of considerations on page 177 is a great list of questions to ask if you're building a data warehouse, but the authors are talking specifically about "spend analysis" and that's my problem. You don't really need a database to do "spend analysis". You can do it with a spreadsheet, and, if you're daring, flat files.

      I can agree that "bats have wings", "this is proof that wings can enable flight", and "ostriches have wings", but I cannot agree that "ostriches have wings and can fly because bats have wings and can fly and, thus, wings will enable their flight". When it comes to analysis, the authors have a lot of the pieces right, but they put them together wrong.

      I never said the use of a data warehouse system to attempt to provide a customer with a spend analysis wasn't the preferred method. 28 out of 30 "spend analysis" vendors agree. I'm saying it's not the right method. I was trying to be nice by not mentioning the alternatives. But since you asked, BIQ is one example. Furthermore, although they are taking a "base cube" approach, the direction Coexprise is heading in (by giving the user the ability to create sub-cubes, derived cubes, and federated cubes on related data) will give users essentially the same level of spend analysis capability, and this is also a correct approach when taken to its logical conclusion.

      For a specific analysis, yes it is true that irrelevant data can be eliminated, and that bad data can be eliminated if you know what the effect of its elimination will be. But the reality is that, in any given company, most people really don't know what they are doing. You can't be an expert in everything, and it's not fair to expect people to be. But if you just eliminate data because it doesn't fit into your "statistically acceptable threshold", then the answers come out skewed at best, and wrong at worst. Being an expert in optimization requires me to be an expert in modeling, and you can't, even accidentally, exclude even a single relevant piece of data and still be guaranteed to get the right result. You might get lucky. But most of the time, the best you will do is get close.

      I don't think you mean "cube theory" here, as it usually refers to areas of mathematics that (as far as I know) aren't really relevant to spend analysis, but I strongly disagree with you when you say that the book is an introduction to "spend analysis". As far as I'm concerned, it's an introduction to "data warehousing" with the purpose of extracting "spend reports". And, although 28 out of 30 vendors will agree with that approach, that's not "spend analysis".

      As for your second-to-last point where you say that "they have a plethora of use case examples where they show real world results", I was avoiding that because it doesn't prove anything. The fact of the matter is that any competent practitioner or consultant, with a little bit of sweat and elbow grease, can do a "spend analysis" project with Microsoft Access and Excel and, depending on just how poor the spend controls were within the target organization, find 10% to 30% in savings without using a spend analysis system at all - because they were doing it 15 years before the first "spend analysis" system ever appeared. After doing essentially the same thing again and again, but spending weeks and weeks, if not months and months, on a project where a lot of the work could be automated if you know what you're doing, some people came to the realization that a technology solution would be useful, and formed companies like Analytics back in the 1990s.

      And, these solutions, just like the solution provided by the company the authors are employed by, work. Spend gets consolidated, cleansed, and more-or-less organized in an understandable fashion. Spending reports can be generated by supplier, category, department, etc. and the buyers can see where the greatest opportunities for savings are likely to be. Data can be compared between time periods to see if costs were going up or down. Companies, including the clients of the authors' employer, find lots of opportunity - at first.

      But then, six to twelve months later, the value attributable to the spend analysis system starts to disappear! The value curve that emerges looks like the following:

      The fact of the matter is that if you have a tool that limits you to a single cube with a single, default, set of reports, once you have exhausted all of the opportunities captured by those reports, you will be unable to extract additional value. But you'll still have to keep paying for what is, at least today, a very expensive system because you'll need to monitor all of the initiatives you put in place to clean up the spending problems you identified. And that's not theory - that's reality!

      It's sad, but if I was still a theorist (which I probably would have become had I stayed in the Ivory Tower), I would have proclaimed the authors' text brilliant. From a theoretical viewpoint, Chapter 6 on Spend Analysis Components is very beautiful. A discussion of information quality, detailed categorization and cleansing rules that account for every possibility, etc. But just because a theory is beautiful, does not mean it is useful - or correct for the real world problem you wish to apply it against.

      What's even more sad, as I alluded to in my post, is that if the authors had written this book seven years ago, I probably would have given it rave reviews. Heck, if they'd written it five years ago, because I was so focussed on trying to define, correct, usable optimization models at the time and was not keeping up with spend analysis or thinking hard about what it should be, I still would have given it rave reviews. My Ph.D. is in theory, and I used to think that if the theory was good enough, the solution would work. The problem is, theorists develop their "solution models" based on "domain models", which are never truly accurate because they are not in-tune with the real world. It took me quite a few years in the "real world" to realize this, and, unfortunately, quite a few more to realize that just because a model was accepted and works well initially, it does not mean it's the right one. And even after I began to believe that data warehousing wasn't spend analysis, it took another year or two, and a large number of well formed arguments to convince me that data warehousing wasn't necessary at all (as I once believed it was a necessary first step). A data warehouse is certainly one way of collecting and centralizing the data needed for spend analysis, but it very seriously impedes data analysis, so it is a fundamentally incorrect approach.

      That's a big reason that Spend Analysis is a common topic on this blog, because I finally figured out what spend analysis isn't a few years ago, and I want to get that message out. I may not know exactly what Spend Analysis ideally should be from a technology perspective, and what I'm writing today may be wrong in ten years time, but I know what it has to do, and I'm going to keep writing about it until the message comes across. I expect that to be a very long time, because I've been trying to get the message out about strategic sourcing decision optimization for even longer, and my message is still, for the most part, falling on deaf ears.


  • 5/1/2008 9:08 AM SCMWise wrote:
    I have a hard time parsing the lengthy commentary of the 'doctor'and making sense of what he/she says. Too much text, too little value!! I agree with Mimo - I get the sense that this is nothing but competitor bashing it worst, one-upmanship at best. I have not read the book in its entirely, but I have read a few chapters, and what i read was good. What gives the 'doctor' the authority to claim that the material is old? Heck, many companies dont even know what spend analysis is!! Furthermore, the treatment proposed by the doctor can cause headaches, or perhaps even brain-damage. For instance, how can someone argue with a straight face that rules-based classification can be used effectively to classify millions of transactions? I thought we answered that question ten years back. (Now who is outdated?). If you have 10,000 transactions from 100 suppliers, perhaps it may work. But I have seen companies with 50 million transactions from 500,000 suppliers attempt this using vendor mapping. Guess what happened. After manually mapping the top 2000 vendors, they gave up. Only 65% of the spend had been mapped. What about the remaining 35%? Well, the doctor says ' Sweep it under the carpet'. Great! That's the remedy. It appears to me that the doctor is trying to take a small, department solution and push it to solve the Enterprise Spend problem. That's a recipy for disastor. Please dont try it.
    1. 5/3/2008 9:39 AM the doctor wrote:
      SCMWise:  

      It's quite clear to me that you are unable to understand what I write, and that's a shame - because until you do, like the vast majority of spend analysis vendors out there (28 out of 30 agree), you'll be stuck in the past until you do.  

      I don't mind if you disagree with me, it's your right, but if you're going to paraphrase me, please do it correctly!

      (1) I'm not suggesting you "sweep unclassified spend under the rug" - and that's one of the reasons I'm bashing the book!  If you read it closely, the book specifically tells you to drop any transaction you can't automatically classify with confidence.  GASP!  However, I do agree that once you have a sufficient percentage of your spend mapped, then you can simply classify the rest into a "miscellaneous bucket" which you can use to create a confidence on your mapping and analysis.  

      (2) I'm not trying to take a small, department solution and push it to solve the enterprise spend problem - because that's precisely how current spend analysis systems, which I argue are insufficient, evolved.  The database was a department solution.  Then someone decided to merge them into a single database that they called a "data warehouse".  Then they decided to equate that with spend analysis.  Although a clean data warehouse can be used as a data source for spend analysis, it is not spend analysis.  Analysis is the ability to analyze data, and that should be independent of the storage format.

      (3) I'm not pushing my own company.  I don't offer a spend analysis solution, nor do I plan to, and I'm not interested in doing spend analysis projects - especially when there are a number of specialist consultancies out there that do it already.  And since I don't offer a competing product, you can't classify this as competitor bashing.  As I said before, I'm bashing the theory and presentation, not the author or the company.  

      As for your other questions:

      "How can someone argue with a straight face that rules-based classification can be used effectively to classify millions of transactions"


      Simple - a rule is NOT limited to a single transaction, and if it supports regular expressions, you are given more flexibility than you have with even the best AI systems out there.


      "After manually mapping the top 2000 vendors, they gave up. Only 65% of the spend had been mapped."


      By vendor is only one way to map spend.  There's also GL code.  There's purchase description.  Although you should start with the top X vendors, top X GL codes, top X vendor-GL code combinations, you shouldn't try to map all the vendors - chances are thousands are miniscule one-time spends in a large multi-national and all you really care about is a category grouping, which might be easily accomplished by a simple rule that maps all purchases with a certain string, or string set, in the description.   The key to quick effective mapping is to use your domain knowledge - knowledge that an AI solution will never have.


      "What gives the 'doctor' the authority to claim that the material is old?"


      Okay, this is snarky, but what gives  you the authority to claim it's not?  Just because you haven't progressed in your understanding of the technology, doesn't mean that the technology hasn't progressed.  If we let the lowest common denominator decide what's "new" and what's "old" then, given the number of people in the world who don't have access to this technology, a 1980's relational database product would still be new.


      Just as you are entitled to your opinions, I'm entitled to mine - and my opinion is that, since the fundamentals presented in the book are the same fundamentals developed by companies such as Analytics and FreeMarkets in the 1990's, it's old.  



Leave a comment

Comments are closed.