white papers, articles, blogs, websites monthly archives by title additional info about CDIstation and the author

Read and comment, that's what makes it work.

Wednesday, August 24, 2005

Colin White on MDM, CDI

Colin White has written an article entitled Understanding Master Data Management and Customer Data Integration posted today on b-eye-network, based on "an in-depth study on data integration for The Data Warehousing Institute." Colin White is the Founder of BI Research, and conference chair for DCI's Portal, Collaboration, and Content Management conference. We examine that study, the conclusions and the state of understanding of the core issues in CDI.

His key findings can be summarized as follows:
(1) Companies are considering data integration from an enterprise perspective, rather than just from a data warehousing viewpoint.
(2) He identifies three approaches to data integration in general, which he calls data consolidation, data federation, and data propagation, which seem pretty intuitively obvious.
(3) He identifies three approaches to CDI, which in a rather confusing twist he refers to as data consolidation, data federation, and hybrid.
(4) He cites some statistics from his study, listing the three main issues for CDI implementation as data quality and security, lack of a business case and inadequate funding, and poor data integration infrastructure.

Overall, his piece is accurate, but offers little in the way of actionable recommendations. Mr. White seems to be on a mission to delineate the differences between various integration technologies, namely EAI, EII and ETL. I'm not sure there's really a lot of confusion in the marketplace over this, so I'm not sure what gives rise to the burning need for clearer definitions.

Whenever you approach a fuzzy area like the intersecting and overlapping technologies behind customer data integration (which include not only the three technologies Mr. White focuses on, but also CDI, CRM, KM, MDM, metadata, data schemas and data models, batch processing frameworks and more), there is certainly a need to impose an order on the chaos in order to facilitate analysis and draw conclusions. One should, however, approach such an exercise with caution, if not trepidation. The risk, of course, is that inaccurate or incomplete definitions can lead to incorrect conclusions.

That's exactly what happens here. The three approaches to CDI that he identifies are less approaches to the discipline of customer data integration, and more technologies for accessing and managing the combined data. Colin White does make one statement that I think is important: "MDM and CDI are often presented as technologies, but in reality they are business applications." And yet, having said that, he proceeds to define CDI approaches in terms of integration technology!

Applications are bought and sold on the basis of delivering business value. The technological underpinnings of an application are certainly important, but only to the extent that they are compliant with established standards and mesh with the overall architecture of your specific environment. And so it should be with CDI. If Alpha Bank is a .NET shop running an XML messaging bus to connect applications, and Beta Bank has a J2EE mainframe-based back office and uses SOAP messaging in a stateless communications bus, the same CDI application is unlikely to meet both needs. But CDI is possible in both places.

If CDI is to take its rightful place in the hierarchy of IT acronyms, we have to stop thinking of it as a technology - or even a set of technologies - and start thinking of it terms of the business problems it solves. As discussed previously, CDI at the highest level is the aggregation of customer data from around the enterprise, the quality assurance of that data, and the distribution of the "golden copy" master data to the target applications. Aggregation, data quality, and distribution. Implicit in this threefold purpose are functions like matching, reconciliation, data quality, data enhancement, and the infrastructure for managing the data such as the data model, hierarchies, metadata and common objects.

In practice, the vast majority of CDI implementations I have seen or worked on have been what Mr. White characterizes as "hybrid" approaches. The reason is simple. A pure federated approach, while theoretically cheap to deploy, breaks down at high volumes due to the processor loads of running application logic in the middleware. (Note this only applies to "federation" as the term is used by Mr. White in his article, not necessarily to architecural federation of the CDI system, or federation of the data model itself.) A pure consolidated approach, while elegant and appealing to the architectural purist in us all, is extremely costly to deploy in a large-scale environment where hundreds or even thousands of systems touch the master data. Hybrid - as dictated by the demands and constraints of the edge applications - is the only practical approach. ("Hybrid" is a term that Siperian popularized in their marketing materials. But from a practical perspective, the distinction is meaningless in describing an approach to CDI, since nearly all CDI approaches are hybrid.)

Let's cleanly separate the application and the technology. Whether you use ETL, EII or EAI to access or distribute the data is a decision that is largely forced upon you by the constraints of the IT environment and the capabilities of the systems to which you must build interfaces. CDI can be implemented using any or all of these technologies - and others besides.

Viewing CDI as an application broadens your horizons. One approach to CDI that Mr. White missed entirely is to use an existing core application as the customer master. Siebel and SAP have hundreds of customers doing this successfully. Certainly it has drawbacks, and is not suitable for most large institutions, but it surely must be considered a valid approach. If your organization is primarily sales- or service-driven, and your customer data is already largely centralized in one application (or application suite) that meets your needs for customer data management, then implementing CDI is simply a matter of formalizing the processes around creating, reading, updating and writing to those customer records.

If I were to attempt to characterize CDI approaches, I would have drawn the lines a little differently. I see five primary approaches:
(1) core application as customer master
(2) "virtual" master record
(3) data warehousing of customer data
(4) discrete customer database with synchronization logic
(5) discrete customer database with integration to the unique master record
_______________

A brief overview and critique of each:
Using a core application as the customer master is the quickest and simplest form of CDI. The shortcomings are the common ones that you run into any time you attempt to use an application for a purpose other than the one for which it was developed - it simply does not do the job very well. CDI has a relatively short list of required functionality to be truly effective, but the functionality is complex. This includes capabilities such as matching and reconciliation of records to prevent duplication, managing access to the data, and privacy management. Modifications to core applications are typically difficult, expensive, and inflexible. Thus the core app as customer master is really only appropriate for smaller and relatively simple business which already have the bulk of their customer data in one centralized application.

The virtual master record is typically executed in the EAI layer. Most modern middleware product suites include sophisticated tools for managing metadata and common objects, enabling master-slave relationships to be established between disparate data sources. The BPM layer maintains synchronization routines that keep the data perfectly aligned. This is the ultimate in ease of deployment, as the strategy here is to "play the data where it lies." Unfortunately the business rules necessary to perform full CDI functionality are complex. The processor overhead is significant, which means that when you get into the millions of records performance degrades. Thus the virtual master record is really only a good fit for small to medium sized businesses.

Many companies have devoted lots of time and money to getting their data warehouses in order, and CDI seems like a logical fit. But data warehouses are typically used to store static data, such as transactions. A transaction is "perfect", and does not change over time. (There may be a correction, or a bust-and-rebill, or a modification to an order, but each of those is a "perfect" transaction as well.) Customer data is different. My phone number may change many times over the life of my relationship with a particular company. CDI is an application that manages data conflict. If two phone numbers for me disagree, CDI decides which is correct and makes the changes, or queues it up for human review. Additionally, data warehouses are typically optimized for access by analytics applications. You need to run reports on a deep repository of customer transactions that often spans many years. CDI master records, in contrast, are typically small and the system that serves them is optimized for query-response many millions of times per day. Different data, different databases. (Note also that the customer master usually sees the data warehouse as another target system, and writes the current record to the DW at periodic intervals.) There has been a lot of progress in data warehousing in recent years, with "active" data warehousing facilitating this management of dynamic data, but the sophisticated CDI application logic is not built. Data warehousing serves a different purpose.

By far the most common approach to CDI is to establish a dedicated customer master, and use integration technologies to keep the edge applications in synch with the master. This ensures that a single, correct set of customer data exists at all times and can be archived and stored for security purposes. It enables snapshots to be taken to create historical records. It allows an audit trail of all changes to the customer record to be created, and tracked for compliance purposes. As a "customer hub", where all data input that affects the customer master records is centralized, it serves as a platform for executing certain functions that need total control, such as privacy management. It creates a layer of abstraction away from the source and target applications, so that synchronization logic can be executed separately from the maintenance of the golden copy database itself. Data can then be delivered to the target systems in whatever format or with whatever frequency the applications are capable of receiving it. Finally, it allows the management of the processes - both those internal to CDI, such as data quality and data enhancement, and those external to CDI, in which the customer master plays a role as a source system - separately from the customer master itself.

Some of the largest institutions shun synchronization. Their approach is similar to the one noted above, but with a goal of forcing the target applications to use the customer master directly. One large bank told me this year, "70-80% of our operational costs in running the customer master are related to synchronization." Another enterprise architect at a major brokerage firm offered this gem, "If you want to avoid the pain and cost of synchronization, it's simple. Just have one copy of the data." This is an approach that might make sense in an environment where you are consolidating multiple CDI systems, typically where a merger or acquisition has occurred. It can also make sense where core applications are being used to store customer data, and the number of integration touch points is small. These situations are relatively rare, and this approach can be risky. In complex environments, trying to force dozens or hundreds of applications to point to an external database - and engineering the appropriate failsafes if the customer master is unavailable - can be an expensive and time-consuming exercise.
_______________

While I am glad to see this article from Colin White, I think that his survey would have been more useful if it focused on the metrics of what is actually happening in the marketplace. Rather than reporting on how many people understand the difference between CDI and MDM, it would be really useful to begin to uncover some metrics for the costs, benefits and implementation methodologies - what is working, and what is not.

In my experience, companies typically have a pretty good idea of the costs of manual or incomplete data management. After all, they are paying the salaries of people who fix the errors and consolidate the records, and the costs of incorrect shipments, failed orders and marketing mailers to bad addresses are pretty explicit.

There are other costs, however. Some of these have to do with customer churn due to bad customer service or bad customer experience. Some have to due with lost business to competitors who have better customer management systems. And there is a whole class of benefits on the revenue side, from cross-selling and upselling the existing customer base, turning every customer interaction into a sales opportunity, winning the competitive bakeoff for new customers and stealing market share from your competitors, and managing the entire customer lifecycle. Arguably, these benefits dwarf the savings that can be obtained on the expense side of the ledger. But it shouldn't be arguable. And for that, we need more data.

Finally, CDI (and MDM for that matter) are relatively new concepts as "applications", and customers are left guessing at implementation costs, data cleansing costs, data loading costs, and best practices for dealing with intractable problems like massive batch loading. What data belongs in the customer master itself, versus simply storing a pointer to the location of that data? What is the tradeoff in the size of the customer record and the performance of the system? How does customization of the data model impact upgradeability of the application?

Vendors like Siebel, IBM, Oracle and SAP are scrambling to gather this type of information, but the industry would be well-served to take the initiative in sharing knowledge and experience and best practices on its own. Those would be some useful statisics.

0 Comments:

Post a Comment

<< Home


Simple Atom XML feed provided by Blogger Rich Site Summary XML feeds available through FeedBurner Make text larger for easier reading Return text to default sizing


Powered by FeedBlitz   (No spam, only email updates)

CDI in the News
Tools

Google
Web CDIStation.com


News aggregated by Google News using search terms:
"customer data integration"
"master data management"
"customer hub"


Inbound XML News Feeds aggregated by FeedDigest


Outbound RSS Feeds provided by FeedBurner

Powered by Blogger
Blogging software provided by Blogger

email subscription by Feedblitz
Email updates provided by Feedblitz


Copyright 2005 CDI Station. All Rights Reserved. Reproduction of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. CDI Station disclaims all warranties as to the accuracy, completeness or adequacy of such information. CDI Station shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.