In a data-driven organisation, data is an asset with a high value: incorrect data leads to wrong decisions as well as errors in communication and process execution. This can lead to costly remedial actions, lost revenue and impact on the organisation’s reputation. The objective of data migration is to ensure that the right data is available in a new environment.
In a data migration, data quality is critical. Will the new system work if there are errors in the data? What if the data in the current system, the source for the migration, is already polluted? Often, the source system is dated, and the data has
become polluted over the years – users may have learned to deal with imperfections, but this may no longer be possible in the new environment.
Data cleansing requires a lot of effort, which can get in the way of a smooth migration.
In this article, we show how we address data quality, data cleansing and enrichment to ensure correct and up-to-date data after migration.
Measuring data quality
To assess data quality, we need to identify errors or potential errors in the data through four types of checks:
- Syntax or input checks determine whether the content of a field contains an acceptable value. For example, a date within a certain range.
- Business rules are mainly about combinations of fields. If a contract is to be canceled, the end date must be filled in.
- Referential checks are about links. A loan requires linked collateral or securities.
- Pattern validation is about the logical sequence of events, often involving timestamps. For example, children cannot be older than their parents, or an application cannot be completed before it is submitted.
In addition to incorrect or missing data, double entry is a common error. There are methods to detect this, as with 'fuzzy logic'. Following
the identification of errors in the data, we can devise an effective cleansing plan.
Data cleansing
To clean data, we must first have the correct data. Sometimes, this data is present internally, in organisation documents or other systems. Sometimes, the correct data must be gathered from external sources, such as independent reference files. The Dutch postcode table is a well-known example of this.
Cleansing can be manual or automated. We define four variants.
- Looking up: the correct data is available internally but must be looked up manually, after which an employee enters the correct values.
- Finding out: the correct data is not available internally and it requires finding out, for example by calling the customer.
- Updating: the correct values are available internally and can be extracted automatically for use during migration.
- Enriching: external data sources are used, and the correct data can be added to the migration environment.
Updating or enriching missing and erroneous data is done during migration. The correct data is added to the migration environment as an additional data source. This way, the audit trail remains intact.
Test data
Only correct data should be migrated. If data is incorrect, we filter out the data object in question and, if necessary, the related objects as well. The migrated data is then always correct and consistent.
Our experience is that data pollution is always worse than expected, and cleansing takes time: sometimes, only a small amount of data gets through all filters at the start of a migration project.
It is desirable, however, to have a sufficient amount of converted data at an early stage for testing purposes. For this reason, we can choose to detect but not filter incorrect data or temporarily replace incorrect data with dummy data. This way, we
have test data available for the target system from an early stage in the migration process.
Often, data migration and the configuration of the target system are parallel projects, meaning that the migration team is working with a system
that’s still changing. To overcome this challenge, we perform continuous data migration between the source system and a test version of the new system. We also keep track of progress in terms of data cleansing and make sure the new system works
well with the migrated data.
Continuous improvement
Data quality is
an issue with every migration, and paying close attention to it is key to success. Yet, data quality is always a key issue for many organisations, even outside of a data migration project. Often, the costs of poor data are hidden, but it leads to employee
frustration, customer dissatisfaction, unnecessary risks and missed opportunities. As a result, data quality requires continuous monitoring and improvement.
The way we address data quality during a data migration project can be applied at any
time. For example, we can use alerts to periodically or continuously check whether the data meets all criteria. We can also monitor trends: is data quality improving? At what points does data quality degrade? What is the cause of this? Such an analysis
helps to implement process improvements or system modifications to prevent errors.
Improving data quality is key to success in data migration, the acceptance of new systems, and the proper execution of daily business processes. In other
words, every effort made towards improving data quality pays off quickly.