Data quality is a uniform cause of deep pain in establishing a trusted data platform in Data & AI projects. The more systems that are involved the harder it gets to clear it up, before you even start accounting for how old they are, how up to speed the SME’s are, how poor front end validation was – there’s a host of potential problems. However something tells me that the number of projects where the customer has said that it’s OK if the numbers are wrong is going to remain pretty small.
Scope, Cost, Time – Choose one. But not that one.
Data Quality is a project constraint
Many of you will be familiar with the Project Management Triangle which dictates that you vary two of Scope, Cost or Time to fix the other. The end result being that in the middle, Quality gets affected. For most Data & AI projects I have found cost and time tend to be least negotiable, so scope gets restricted. Yet, somehow Time and Cost get blown out anyway.
Whilst Data & AI is hardly unique in terms of cost and schedule overruns, there is one key driver which is neglected by traditional methods. Leaning once again on Larissa Moss’s Extreme Scoping approach, she calls out the reason. It’s because in a Data & AI project, Quality – specifically Data Quality – is also fixed. The data must be complete and the data must be accurate for it to be usable – and there is no room for negotiation on this. Given that the data effort consumes around 80% of a Data & AI projects budget, this becomes a significant concern.
How do we manage Data Quality as a constraint?
We have to get the business to accept that the traditional levers can’t be pulled in the way they are used to and that requires end user education. The business needs to be made aware that it is a fixed constraint – one that they are imposing, albeit implicitly. The business has to accept that if Quality is not a variable, then the three traditional “pick two to play with” becomes “prepare to vary all of them”. Larissa Moss refers to this as an “Information Age Mental Model” which prioritises quality of output above all else.
Here is where strong leadership and clear communication comes into play. Ultimately if one business demands a certain piece of information the Data & AI project team will have to be clear to them that to obtain that piece of data to the quality which is mandated, they must be prepared to bear the costs of doing so, including the cost of bringing it up to a standard that means it is enterprise grade and reusable, so that it integrates with the whole solution for both past and future components of the system. This of course does not mean that an infinite budget is opened up to deal with each data item. Some data may not be worth the cost of acquisition. What it does mean is that the discussion about the costs can be more honest, and the consumer can be more aware of the drivers for the issues that will arise from trying to obtain their data.