Data Cleansing Could Lead to Clearer View of Consumers

Among a business’s greatest assets is the information it collects, but only if it can sort through and present that information in a timely fashion.

Enterprises collect millions if not billions of bits of data on customers, products and suppliers. Beyond merely storing the data, they must decide which information is important and how to organize it for easy retrieval. Someday some of them hope to use or even sell the data to mine information such as buying patterns from it.

“Data cleansing,” or the process of sorting through the mountain of data and removing old, inaccurate, badly formatted or duplicate information, can help, but it won’t solve the problem completely.

“The biggest problem with data cleansing — and it’s a perpetual problem — is that you have many different origins of information,” Eric Austvold, research director at AMR Research, told the E-Commerce Times.

“At the moment of creation of a record for a new product, new customer, new order, you want data instantly. With a customer, what you want to know is, is this customer already a customer?”

Cutting Through the Confusion

With common names, the system has to quickly distinguish among them to reach the correct one. Data cleansing helps front-line workers sort through multiple names and points out potential conflicts that inevitably arise, he said. What if a customer has created more than one online account because he forgot his login? What if a customer moved, but the new address was inputted incorrectly, so the account can’t be found? Trying to cleanse and integrate all that data can be a headache.

Austvold is not just talking about these situations, although they are a consideration, but of more complicated situations such as bank mergers in which it can be very difficult to link the separate accounts of the same customer once they have become part of the same institution. Some of this comes from privacy and security regulations.

“The unfortunate part is that a bank account number is unique and that’s how banks sort,” he said. “Now, when two banks merge, there may be two unique identifiers based on unique customer numbers. In an ideal setting, they want a single view of the customer; they want to know everything about you in one screen.”

While data cleansing algorithms can correlate multiple records onto that screen, the information still has to be available and inputted in a way that the programs can understand. But even in less-regulated industries, it can sometimes be difficult to sort through the terabytes.

People Make Mistakes

Eighty percent of data integrity problems stem from human error in inputting, formatting or even interpreting, John Nelson, senior analyst with Data Mobility Group, said.

Nelson points out that enterprises must make conscious decisions about what data should be saved, for how long and how accessible each type of data should be.

“Information lifecycle management (ILM) is a way of compartmentalizing or stratifying data you’re saving. The data has various levels of importance to an organization or value to an organization. There’s stuff you’re only keeping because the government says you have to, but it has no business value. There’s stuff you want to keep and be able to access,” he said.

The first step toward getting the information you need in the instant that you need it is that triage process that Nelson describes. While enterprises could just save all the information they collect, it’s just not a good idea.

“Data in some instances is baggage. You have to carry it around with you and it can slow you down,” he said. “There’s a price to be paid for having the default position of ‘We save everything,’ because if you keep it, you have to protect it.”

Once an enterprise has defined tiers of data from most useful to least and let go of what it can toss out completely, it can begin to integrate and cleanse what is left.

“Data integration and cleansing go hand in hand,” Austvold said. “Sometimes companies say, ‘We thought we could reconcile at the point of contact with the customer, but we need integration.’ They’re trying to integrate in real time, but sometimes they don’t even know how many parts they have.” That holds especially true in cases of financial institution mergers or the health care industry, where each hospital visit generates a new account number.

Leveraging the Information

But we are approaching the day when enterprises will go beyond grappling with real-time customer data integration to using technology to look for patterns in spending habits so that marketers can customize their campaigns.

We are nearing a time when the data algorithms get sophisticated enough to figure out individual spending patterns through linking the records of a credit card user. For instance, if a customer uses a MasterCard in several different Target stores, the algorithm would be able to figure out that the credit card holder was buying boys’ pants and ping the computer to offer a personalized special while that customer was at the store.

“The biggest challenge is speed and the real-time nature of correlating the volume of data and pattern for what it means,” Austvold said. “We’re years away from it being a reality.”

Now, companies are focused on internal data, he said — for instance, Target tracks the buying patterns of its customers — but that may change.

“Marketing guys are saying if Visa, MasterCard, American Express could syndicate their data and create a process by which a subscribing company can look at data and discern patterns, that would change the way we market products,” Austvold said.

We’re not there yet, but the day is coming when enterprises will have the power to truly harness the data they collect, instead of merely worrying about how, and if, to store it.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

E-Commerce Times Channels