Data Cleansing Made Simple

By Ancient1 on Friday, November 6, 2009

0 Comments

Filed Under: PC Management, Technology

I would like to just give some laymen understanding on a product that I am specialized in for the past 8 years. Most of the time when meeting clients they do not know or understand the concept of data cleansing. They have always thought that this process can be done using simple manpower overseeing the data that has been built for the past 10 years.

Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency, and either corrected, or deleted as necessary. Data cleansing can occur within a single set of records, or between multiple sets of data which need to be merged, or which will work together.

At its most simple form, data cleansing involves a person or persons reading through a set of records and verifying their accuracy. Typos and spelling errors are corrected, mislabeled data is properly labeled and filed, and incomplete or missing entries are completed. Data cleansing operations often purge out of date or unrecoverable records, so that they do not take up space and cause inefficient operations.

In more complex operations, data cleansing can be performed by computer programs. These data cleansing programs can check the data with a variety of rules and procedures decided upon by the user. A data cleansing program could be set to delete all records which have not been updated within the last five years, correct any misspelled words, and delete any duplicate copies. A more complex data cleansing program might be able to fill in a missing city based on a correct zip code, or change the prices of all items in a database to Euros instead of US Dollars.

Data cleansing is very important to the efficiency of any data dependent business. If some of the clients within a database do not have accurate phone numbers, your employees cannot easily contact them. If your clients’ email addresses are not formatted correctly, an automated email system would be unable to send out the latest coupons and special deals. The job of data cleansing is to insure that the data within a system is correct, so that the system is able to use the data. Inaccurate or incomplete records are not much use to anyone.

Whenever two systems of data need to work together, data cleansing is even more important. If a company has two branches, which might work with many of the same customers, not only does the data in each branch need to be complete and accurate, but the two branches need to have matching data. If a customer updates her phone number with one branch, the data at the other branch needs to be updated with the same information to insure the highest efficiency. Data cleansing works not only to make sure that data is accurate, but also that it is consistent between different records.

One of the best products that I have seen and used so far has been SAS DataFlux. There are also other programs out there that does the same thing such as Trillium and others. But to my expertise and experience working with this company has proven to be successful.

No Comments for this post

No comments yet.

Leave a comment

Name (required) Comment
Mail (required)
Website

21,460
SPAM KILLED
BY WP-SPAMFREE