I am currently working on a project that migrates data from SQL Server to Aurora Postgres. I have more than 5 tasks including several validation tasks as well. A total of 70+ tables were migrating with data in millions. Everything was working fine until we were planning to few more tables and create a new validation task for it.
The DMS task errored due to connection timeout to the source database. During our initial R&D, we faced similar issues and we were able to figure out the issue easily. It was mostly because of the following 2 reasons.
- The source database and DMS instance are under the different VPC (if VPC is enabled for source DB)
- Wrong DB password
So once we got a timeout issue we verified that it was not because of the above reasons.
Timeout Error Reason
We were familiar that DMS uses CDC from SQL Server to capture the data changes. It uses a function called fn_dblog() to read the logs and replicate the changes to the target. After a few weeks in production, we noticed that the logs in this function are huge and we are not able to query except the top N rows of the function. So I found out that this was the reason.
DMS task cannot resume the task from a certain time since the fn_dblog() query times out. So the only way to resume the task is to use the checkpoint. In order to do that stop the existing validation task, refresh the page, and copy the checkpoint detail under the task details. Paste this checkpoint in the new validation task in order for the task to resume.