AWS DMS is one of the popular services used for data migration, especially for huge data migration. For the past few month, I have been working on AWS DMS service and I had many hurdles then and there. In this blog, I will be explaining various steps involved in migrating the SQL server database to Aurora Postgres.
If you are someone new to AWS, feel free to click here to know about it.
Note: If you are only worried about bringing the existing data to Aurora Postgres and not concerned about regular sync-up of the database then this job is as simple as could imagine.
AWS DMS by default creates a table in the destination (if the table does not exist in the target) with minimal configurations required such as schema, tables (with appropriate columns and datatypes), etc. It does not create things like constraints, indexes, etc. In such cases, you need to consider using AWS Schema Conversion Tool (SCT) which converts table schema, indexes, stored procedures, etc. As far as stored procedures and functions are concerned SCT tool tries to convert as much as possible and comments out the unconverted script with comments related to that.
Note: SCT also allows to generation of a report after analyzing the source DB and gives details like how much percent of source schema can be converted automatically by AWS SCT.
DMS initial setup
The initial setup of DMS includes the following
- Create replication instance
- Create Endpoints for source and target
Create replication instance
Go to the DMS console and click on replication instance. Create a replication instance with your required configurations. Take a look at the DMS pricing to make decision-making easier. To start off with I would suggest using “dms.t3.medium” as the class type for testing purposes. Now our replication instance is ready. The next important step is to add the public IP of the DMS instance created into the security group of the source and target database.
Create Endpoints for source and target
In the DMS console, the first thing you need to do is create 2 endpoints. One connects to the source (SQL Server) and the other connects to the target. The endpoint setup is very user-friendly so just fill in the necessary details. Once you set it up click “Test Connection” to check if the configuration was proper and working. If you have trouble connecting make sure you have configured the security group properly.
Existing data migration
If you want to migrate only the existing data, then go ahead and create a database migration task. select the source and destination endpoint appropriately and make sure you select “Migrate existing data” in the Migration type.
The selection rule allows you to choose the schema and/or tables you want to include with, or exclude from, your migration task. You need to add at least one selection rule to proceed. The below selection rule allows every schema and table to be migrated.
This is an optional rule that DMS offers where you can change or transform the schema, table, or column names of some or all of the selected objects
Once you configured the above go ahead and create the task. The task will run seamlessly and you can find data being migrated.
The ongoing replication is where I had to spend lots of hours figuring out what all needs to be configured. But you are fortunate enough to be at the right place because I am going to tell you exactly what all steps I followed.
So the main step is to enable CDC in the SQL server database. By enabling the CDC DMS will be able to read the change logs and replicate them appropriately on the target. In order to enable CDC to execute the below scripts on SQL Server.
--enable CDC on database level use YOUR_DB_NAME; exec msdb.dbo.rds_cdc_enable_db ‘YOUR_DB_NAME’
In order to begin tracking the tables execute the following script.
use YOUR_DB_NAME exec sys.sp_cdc_enable_table @source_schema = N'dbo' , @source_name = N'TABLE_NAME' , @role_name = NULL
The above script has to be executed literally for every table which will take a long time. So I wrote a Stored Procedure that loops every table and executes this script
create or alter procedure sp_enable_disable_cdc_all_tables(@enable bit) as BEGIN TRY DECLARE @source_name varchar(400); declare @sql varchar(1000) DECLARE the_cursor CURSOR FAST_FORWARD FOR select name from sys.tables where schema_id = 1 and name not like '%VIEW%' and is_tracked_by_cdc = case WHEN @enable = 1 THEN 0 ELSE 1 END OPEN the_cursor FETCH NEXT FROM the_cursor INTO @source_name WHILE @@FETCH_STATUS = 0 BEGIN if @enable = 1 set @sql =' Use YOUR_DATABASE_NAME;EXEC sys.sp_cdc_enable_table @source_schema = N''dbo'',@source_name = '+@source_name+' , @role_name = N'''+'admin'+'''' else set @sql =' Use YOUR_DATABASE_NAME;EXEC sys.sp_cdc_disable_table @source_schema = N''dbo'',@source_name = '+@source_name+', @capture_instance =''all''' exec(@sql) FETCH NEXT FROM the_cursor INTO @source_name END CLOSE the_cursor DEALLOCATE the_cursor SELECT 'Successful' END TRY BEGIN CATCH CLOSE the_cursor DEALLOCATE the_cursor SELECT ERROR_NUMBER() AS ErrorNumber ,ERROR_MESSAGE() AS ErrorMessage; END CATCH --to enable all tables EXEC sp_enable_disable_cdc_all_tables 1 --to disable all tables EXEC sp_enable_disable_cdc_all_tables 0
Once this is done you need to update the polling interval which is responsible for how long the logs need to retain.
EXEC sys.sp_cdc_change_job @job_type = 'capture' ,@pollinginterval = 86400; GO exec sp_cdc_stop_job 'capture' exec sp_cdc_start_job 'capture'
Note that you need to stop and start the job in order for this to get impacted.
In order to confirm if CDC has been enabled for your tables just do some insert, update or delete to your table and run the following query to see if the changes are logged.
SELECT [Current LSN],[Begin Time],[Transaction Name] FROM fn_dblog(NULL, NULL) order by [Begin Time] desc
fn_dblog() is the function that DMS query under the hood in order to read the change logs. Now you are ready with the SQL configurations. Let’s create an ongoing replication task.
In the DMS console, click on the database migration task and click on create a task. Select the appropriate source and target database endpoint and the migration type as “Replicate data changes only”.
You can also select from which time or LSN (Log sequence number which can be retrieved from fn_dblog() function) you want to replicate changes.
Also, you need to create at least one selection rule as mentioned above and finally click on “Create task”. And that’s it. After a few minutes, you should start seeing your changes replicated to the target which is Aurora Postgres.
During the task creation process, you can find a checkbox that asks for “Turn on validation” as below.
You can also create a task that does only validation by doing the following
You can read the DMS Best practices to be more precise on the task that you create. I hope this blog saved you a lot of time.
Feel free to leave a comment down here. I would love to hear from you. Happy programming!!