Data migration, that's mostly working out the data mapping: determining for each field of the source system what the corresponding field in the target system is. This is a common misconception about data migrations. Few migrations can be handled in this way. In most migrations, such an approach fails because there is no one-to-one mapping. There are differences per product or customer type or the mapping depends on certain statuses. In other words, the mapping and conversion rules apply to a certain selection.
In this article we show how a complex data migration is set up. We show that selections are the core of a data migration. The sequence of the selections makes that the data to be migrated flows from source to target. We shift the focus from data mapping to data flow.
Product mapping
But first, we shift the focus from data to products or objects. Starting with a product mapping, we analyze how products are recorded in the source and target system. By products, we mean functional data objects belonging to products such as subscriptions, financial products or certain business services or cases. These data objects are the main selections for the dataflow. A fit-gap analysis indicates whether the current products fit into the target system. If not, choices must be made: adjust business products, adjust the target system, or place these products out of scope.
These kinds of choices have a big impact on business and IT. Product changes cannot be made overnight. Often this can only be done in consultation with customers. A system change can also be complex, and in doing so, the migration project faces a moving target.
It is therefore important to make choices and determine the scope at the earliest possible stage. This involves not only current products, but also transactional history, for example. Which old transactions should be included? And if those are former products, will they fit in the target system? What happens to history that is left behind? Will it be migrated to a separate system for historical data management; after all, history must be managed in accordance with the GDPR.
Tunnel, funnel or cone
Once the product mapping is clear and the scope-defining choices have been made, it is also clear what the main selections are that will be used to realize the data migration. But how does the data flow from the source system to the target system? The data flow sometimes goes through a tunnel, sometimes through a funnel, or through a cone.
We speak of a tunnel in simple situations where the products to be converted fit the target system. The structure of the products does not change significantly. The data mapping is usually one-to-one. This situation is often quickly recognizable because the number of objects and attributes in source and target do not differ much.
We speak of a funnel when several objects in the source need to be merged into one object in the target. For example, several business products from the source are merged into one product in the target. A funnel also occurs when there are multiple source systems from which data is merged into a single target. In a funnel, the data mapping depends on the source or source objects.
We speak of a cone when an object from the source must be broken down to different objects in the target. The breakdown often depends on a state or type. Also with a cone, data mapping is not simple one-to-one. Data mapping becomes even more complex when different forms of dataflow are mixed together. The dataflow approach helps us to untangle this complexity. Each object and each selection defines the functional rules that belong there. If a change is needed, the impact can be read from the graphical representation of the dataflow. In a correct dataflow design, each rule is implemented in only one place.
Validation and conversion
Once we know how the data should flow for each object, we can work it out further. The main selections are split into sub-selections and all the selections together take care of the data migration. During migration, the correct order of the selections is determined automatically.
Selections ensure proper scope, merges and splits. Operations can be performed on data in selections for validation and conversion. Validation is about data quality; does the data within a selection meet the requirements of the target system, or additional business requirements. Data that does not comply is flagged. We distinguish between failure and labeling. Data that really shouldn't go to the target system is filtered out, so this is failure. Data that we know or think is incorrect, but which we still want to transfer, is labeled. Sometimes in an early test migration it is useful to filter little and label a lot, so that enough data is migrated for testing purposes. During the final migration, most validations will be set as filters. After all, we want to fill the target system with correct data only.
Conversion rules are also applied to data in selections. These involve field operations that ensure the data meets the field requirements of the target system. Sometimes it involves splitting or merging fields, for example, splitting an address line to street name and house number. Sometimes it involves translations of codes. Each system has its own codes and based on smart translation tables, these are converted.
Converted data
The total of selections, validations and conversions provides the overall data migration, converting the source data to the target data. We use an automated process to put all selections and operations in the right order. A data flow diagram visualizes the entire data migration in a way that is clear to business specialists and provides insight into how the conversion flow is carried out.
The result of all the selections, validations and conversions is the converted data which is then loaded into the target system. By putting not data mapping but selections and dataflow at our focal point, we make a very complex migration simple and deliver it using an agile way of working. You eat an elephant one bite at a time. The same goes for a complex data migration.
A Dutch version of this blog is published at LinkedIn: Datamigratie: van datamapping naar dataflow | LinkedIn