data quality and data intergration

DATA QUALITY AND DATA INTEGRATION Assignment Overview Here’s a recent posting: Blog: Claudia Imhoff [available at http://www.b-eye-network.com/blogs/imhoff/archives/2005/04/data_quality_or.php] Data Q

DATA QUALITY AND DATA INTEGRATION

Assignment Overview

Here’s a recent posting:

Blog: Claudia Imhoff

[available at http://www.b-eye-network.com/blogs/imhoff/archives/2005/04/data_quality_or.php]

Data Quality or Data Integration – which is more difficult?

I read an interesting article in the Business Intelligence Pipeline Newsletter recently asking which was the more difficult challenge – assuring data quality or integrating data from across your organization. They have a voting booth set up so you can cast your vote for which you believe is the more difficult task. I have my

own opinion as well.

I voted for assuring data quality and, at the time of my vote, it appeared that the majority of voters agreed with me. Why? In my opinion, it is because of the assuring part of the task.

Data integration seems to be a much more straightforward task with more mature technologies, methodology, and practical expertise in the data integrators. Even the definition of data integration seems to be cut and dried. (Not always but at least you have a solid standard to go from — a single version of the truth…)

I think we are still feeling our way through what it means to assure data quality. While there certainly is useful technology to help with data quality, so much of the assurance part is still heavily dependent on the human being (in this case, usually a business person) eyeballing the cleaned up data to verify its “quality”. There don’t seem to be very clear, standard methodologies or processes to follow either. And what are the metrics of quality? When to we reach a state of “quality”? And what exactly does quality data even mean?

Without answers to these fundamental questions, it seems to me that we will continue to struggle with this challenge more so than with that faced by data integrators.

Your thoughts?

Yours in BI success,

Claudia

It’s a fair question being posed here. And there are probably even good answers to it. But is it really the right question? Are “data quality” and “data integration” really even measured on a common scale, where it’s possible to say that one has been achieved more than the other? Maybe they are like precision and accuracy in

science, or validity and reliability in research methods — two separate properties, both necessary but not substitutable for each other? And what is the appropriate level of aggregation and measurement at which it makes sense to talk about quality and integration? Is it really possible to think of your “data” as having a

certain level of “quality”, or is it possible to make such statements only about individual datums?

Review the required readings below:

Data Quality Quiz [available at http://searchcrm.techtarget.com/generic/0,295582,sid11_gci1049999,00.html]

Imhoff, C. Blog: Data Quality or Data Integration – which is more difficult?

[available at http://www.b-eye-network.com/blogs/imhoff/archives/2005/04/data_quality_or.php]

Lindsey, E. (2011) Busines value assessment versus data quality assessment,

http://blogs.informatica.com/perspectives/2011/02/23/business-value-assessment-versus-data-quality-assessment/

Top Ten Practices for Data Integration

http://www.informationweek.com/whitepaper/Business-Intelligence/Data-Quality/top-ten-best-practices-for-data-integration-wp1277997732572;jsessionid=KTYXJEDDFPDQFQE1GHPCKHWATMY32JVN?articleID=151500009

Mills, Rob (2010) Ten golden rules of business intelligence, CIO,

http://www.cio.com.au/article/340702/ten_golden_rules_business_intelligence/

Also, consult material from the Background Readings or related other materials you find yourself. You’ll probably want to do some searching for more on data management on some of the institutional resources websites, of which there are a plethora, or maybe two plethora.

Case Assignment

When you’ve read through the articles and related material, scanned the websites, and thought about it carefully, please compose a short (5- to 7-page) paper on the topic noted above — that is: Are data quality and data integration two different things? Can we have one without the other?

Hide