Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Master Data Management, Enterprise Knowledge ManagementThe Data Quality Approach and Business IntelligenceThe Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's note: More David Loshin articles, resources, news and events are available in the David Loshin Expert Channel on the BeyeNETWORK. Be sure to visit today!

Yesterday we recorded the monthly B-eye-network radio program, and one of the questions Shawn asked was about our summer reading suggestions. The two books I mentioned are not business intelligence books, nor are they even business books, but rather history books about stuff that happened 800-900 years ago. The first book is about the life of Mongol conqueror Genghis Khan, while the second details Marco Polo's travels in the service of Asian emperor (and grandson of Genghis) Kublai Khan. These two Khans reflect two archetypical business leaders.

From (my spin on) a historical perspective, Genghis is the entrepreneur - he identifies with a clear business objective, develops a plan for executing against that objective, and replicates that execution over time and space to build a global enterprise enveloping multiple acquisitions. Genghis exploits the skills of his Mongol horde to instill fear in, overcome, and then embrace different cultures and regions as part of the Mongol Empire.

Kublai, on the other hand, tasks himself with crossing the chasm - developing a plan to effectively integrate those acquisitions into a cohesive operation. Brilliantly, instead of standing on ceremony in retaining his Mongol heritage, Kublai moves his headquarters to Cambulac (now known as Beijing) in the center of the Chinese acquisition and begins to align the Mongol operation to Chinese techniques. He transitions from a nomadic life style to one of techno-agriculture, enhances communication channels, standardizes paper money and monetary exchange, and engineers a hierarchical governing structure to manage the empire.

Different management styles for different types on environments, and different kinds of lessons. Enjoy!


Posted June 23, 2009 11:26 AM
Permalink | No Comments |

Sybase has just posted a white paper that I wrote on high performance analytics using column-oriented databases. Despite my recent (10+ year) foray in the data world, my original background is in high performance data parallel computing. Parallel columnar database systems reflect the best of both worlds I have lived in, and I hope you guys find the paper interesting!


Posted May 21, 2009 3:46 PM
Permalink | No Comments |

I recently read a white paper from IDC analyst Dan Vesset and Brian McDonough, called "Improving Oranizational Performance Management Through Pervasive Business Intelligence" that comments "...those organizations that rank themselves as more competitive within their industry tend to place greater importance on data governance." Clearly, data governance is a critical component of any enterprise activity intended to support organizational value drivers.

Also, a reminder: I am presenting at a series of upcoming live events, sponsored by Informatica, Teradata, and Microstrategy. In these talks, I will be exploring the concept of "right-time" operational business intelligence, the business and value drivers, and the basic expectations for technology components that would enable expanded delivery of actionable intelligence to decision-makers across the organization.

The title of the event is "Turn Uncertainty to Advantage: Operational Business Intelligence is the Key," and you can register by clicking on the links below:

 

Milwaukee, May 6th

Kansas City, May 7th

Cincinnati, May 20th

Indianapolis, May 21

Las Vegas, June 3

 

 

 

 


Posted May 5, 2009 6:01 AM
Permalink | No Comments |

Interesting in pervasive business intellignce? I am presenting at a series of upcoming live events, sponsored by Informatica, Teradata, and Microstrategy. In these talks, I will be explorng the concept of "right-time" operational business intelligence, the business and value drivers, and the basic expectations for technology components that would enable expanded delivery of actionable intelligence to decision-makers across the organization.

The title of the event is "Turn Uncertainty to Advantage: Operational Business Intelligence is the Key," and you can register by clicking on the links below:

 

St. Louis, May 5th

Milwaukee, May 6th

Kansas City, May 7th

Cincinnati, May 20th

Indianapolis, May 21

Las Vegas, June 3

 

 

 

 


Posted April 29, 2009 9:21 AM
Permalink | 2 Comments |

Anyone going to Enteprise Data World in 2 weeks? I will be teaching a tutorial in partnership with industry sponsors discussing trends, architectures, and best practices for Master Data Management. I am looking forward to a great set of sessions. If you are attending the tutorial and have specific questions you'd like considered, email me (loshin@knoweldge-integrity.com).

 


Posted March 26, 2009 12:44 PM
Permalink | No Comments |

I had a great opportunity to attend Monday's sessions of the Gartner BI Summit. Here are some quick reactions:

 

Regarding the Gartner Keynote by Kurt Schlegel and John Van Decker: Good high level overview, with what I think was a disproportionate focus on the current economic environment as a driver of specific actions. If there is true strategic business value in BI, the long term view should suggest that now is a good time to regroup thinking and establish a framework for justifying the investment, not looking at ways to cut costs. On the other hand, one interesting comment did address the emerging need for less concentration on canned reports in deference to enabling parameterized queries, as well as providing more capability for allowing business users to perform more unconstrained data analysis. I thought those were good suggestions.

Regarding data integration: I sat in on Ted Friedman's talk on data integration tools (sitting in the back with SAS data integration guru Ken Hausman), and Ted gave a good overview of the landscape, with a nice amount of attention given to the "sharing" side of data integration, which (unfortunately) is largely ignored by many others who choose to focus on the "consolidation" aspects. Good job, Ted! 

Andrew White's MDM session did provide some explanatory material about the difference between analytic MDM and operational MDM, but did not go far enough to discuss the challenges inherent in transitioning an organization's application infrastructure to employ a single master repository. While the suggestions to "start small" are reasonable, my gut tells me that small pockets of siloed concept repositories used by few applications may address some business requirements but may not qualify as master repositories. It may be better for us to rewind that message a little bit and focus on th evalue porposition of what is promised by MDM before we go overboard in trying to implement it...

I did stroll around the exhibit floor and got a chance to visit with my friends Jake Zborowski and Donald Farmer from Microsoft, John Evans from Kalido, Harriet Fryman of (now) IBM, while making some new friends at Greenplum and Aster

Lastly: I was treated to an interesting product demo from Scott Davis from a company called Lyzasoft, providing a low-cost alternative for desktop business user-directed data analysis. One nice aspect is its intergation of columnar data management along with indexing that essentially enables high-performance data access, even on a laptop. 


Posted March 11, 2009 7:01 AM
Permalink | 1 Comment |

I am stepping a bit out of my area of comfort to reflect on some thoughts regarding today's inauguration pomp and circumstances. Last night I was leafing through James Surowiecki's The Wisdom of Crowds
(which is a really good book, I highly recommend it) and a passage described criticism of the United States' early concepts of democracy in which Europeans mocked the notion that the general public was empowered to vote for and elect the leaders of the nation. But, contrary to this haughty noble's opinion, the power vested in the people by the United States Constitution not only has withstood the test of time, it, along with the Bill of Rights and the other accumulated amendments have allowed this glorious experiment in "forming a more perfect union" to thrive.

This day we experienced the transition of one presidency to another. Although the context of President Obama's assuming the nation's highest office is historic, while I watched the new president taking the oath of office, there were tears in my eyes. This was not just because of today's history, but it was compounded by the simple fact that I, along with every other citizen, live in a place where our constitutional rights allow us to make the creation of history a reality.
And in the spirit of patriotism that we all share on this inauguration day, it is each and every American citizen's duty to exercise those constitutional rights:

- To freely practice the religion of your choice;
- To not just speak freely, but scream loudly when criticism is warranted, (or even if it is not!)
- To a free press that demands transparency from those elected few who lead our nation, as well as those millions who willingly serve the nation as employees in the public sector; and
- To join with others in a peaceable assembly to exercise these rights.

There are people who wish to quiet those who criticize the office of the president. But recall what George Washington said: "If the freedom of speech is taken away then dumb and silent we may be led, like sheep to the slaughter." And to those people willing to yield any part of their liberties to the government in the name of security, the sixth amendment to the constitutional grants the right for people to be "secure in their persons, houses, papers, and effects, against unreasonable searches and seizures." Recall the words of Benjamin Franklin: "Any society that would give up a little liberty to gain a little security will deserve neither and lose both."

There have been many changes in our collective lives over the years of the previous administration - turmoil, pain, loss, growth, success, failure, more success, even greater failure. Times not only change, but the speed of change seems to increase as well. Hopefully our new leaders will apply consideration and thoughtfulness as they plan programs to move us all forward into the future.


Posted January 20, 2009 8:36 PM
Permalink | No Comments |

Quick thought experiment: You are configuring a scorecard to report a rolled-up key performance indicator, or KPI. This scorecard starts out with a KPI that is based on a single measured metric, and you have a process in place to measure that metric, apply some weights to the raw score, and then present that score, which is then presented in relation to previously reported scores for the same KPI.

As time progresses, the managers decide that the KPI can be improved by integrating a second measurement and weighted raw score. This is implemented, but here is the issue: the new representation of the KPI is a different indicator with the additional measurement than it was prior to the integration of that measurement. So can the score associated with this new incarnation of the (same old) KPI be compared with the previously reported scores?


Posted January 5, 2009 9:59 AM
Permalink | No Comments |

It is currently the holiday break, which means two things. First, almost everybody is taking time off, which means that (second) there is a little bit of breathing room for us to sit and ponder issues pushed into the background during the rest of the year. One of those items has to do with data quality scorecards, data issue severity, and setting levels of acceptability for data quality scores.

Essentially, if you can determine some assertion that describes your expectation for quality within one of the commonly used dimensions, then you are also likely to be able to define a rule that can validate data against that assertion. Simple example: the last name field of a customer record may not be null. This assertion can be tested on a record by record basis, or I can even extract the entire set of violations from a database using a SQL query.

Either way, I can get a score, perhaps either a raw count of violations, or a ratio of violating records to the total number of records; there are certainly other approaches to formulating a "score," but this simple example is good enough for our question: how do you define a level of acceptability for this score?


Posted December 30, 2008 11:45 AM
Permalink | No Comments |

One good thing about being busy is that you get opportunities to streamline ideas through iteration. My interest in data profiling goes pretty far back, and the profiling process is one that is useful in a number of different usage scenarios. One of these is data quality assessment, especially in situations where not much is known about the data; profiling provides some insight into basic issues with the data.

But in situations where there is some business context regarding the data under consideration, undirected data profiling may not provide the level of focus that is needed. Providing reports on numerous nulls, outliers, duplicates, etc. may be overkill when the analyst already knows which data elements are relevant and which ones are not. In these kinds of situations, the analyst can instead concentrate on the statistical details associated with the critical data elements as a way to evaluate the extent to which data anomalies might impact the business.

So in some recent client interactions, instead of just throwing the data into the profiler and hoping that something good comes out, we narrowed the focus to just a handful of data elements and increased the scrutiny on the profiler results, sometimes refining the data sets, pulling different samples, segmenting the data to be profiled, joined different data sets prior to profiling, all as a way to get more insight into the data instead of the typical reports telling me that yet another irrelevant data element is 99% null. The upshot is that a carefully planned process for driving the directed profiling process gave much more interesting results, both for us and for the client.


Posted December 23, 2008 1:51 PM
Permalink | No Comments |
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›