The ultimate guide for migrating Microsoft SQL server to Aurora Postgres using AWS Database Migration Service (DMS)

AWS DMS is one of the popular services used for data migration, especially for huge data migration. For the past few month, I have been working on AWS DMS service and I had many hurdles then and there. In this blog, I will be explaining various steps involved in migrating the SQL server database to Aurora Postgres.

If you are someone new to AWS, feel free to click here to know about it.

Note: If you are only worried about bringing the existing data to Aurora Postgres and not concerned about regular sync-up of the database then this job is as simple as could imagine.

Schema Conversion

AWS DMS by default creates a table in the destination (if the table does not exist in the target) with minimal configurations required such as schema, tables (with appropriate columns and datatypes), etc. It does not create things like constraints, indexes, etc. In such cases, you need to consider using AWS Schema Conversion Tool (SCT) which converts table schema, indexes, stored procedures, etc. As far as stored procedures and functions are concerned SCT tool tries to convert as much as possible and comments out the unconverted script with comments related to that.

Note: SCT also allows to generation of a report after analyzing the source DB and gives details like how much percent of source schema can be converted automatically by AWS SCT.

DMS initial setup

The initial setup of DMS includes the following

  1. Create replication instance
  2. Create Endpoints for source and target

Create replication instance

Go to the DMS console and click on replication instance. Create a replication instance with your required configurations. Take a look at the DMS pricing to make decision-making easier. To start off with I would suggest using “dms.t3.medium” as the class type for testing purposes. Now our replication instance is ready. The next important step is to add the public IP of the DMS instance created into the security group of the source and target database.

Create Endpoints for source and target

In the DMS console, the first thing you need to do is create 2 endpoints. One connects to the source (SQL Server) and the other connects to the target. The endpoint setup is very user-friendly so just fill in the necessary details. Once you set it up click “Test Connection” to check if the configuration was proper and working. If you have trouble connecting make sure you have configured the security group properly.

Existing data migration

If you want to migrate only the existing data, then go ahead and create a database migration task. select the source and destination endpoint appropriately and make sure you select “Migrate existing data” in the Migration type.

Selection Rule

The selection rule allows you to choose the schema and/or tables you want to include with, or exclude from, your migration task. You need to add at least one selection rule to proceed. The below selection rule allows every schema and table to be migrated.

Transformation Rule

This is an optional rule that DMS offers where you can change or transform the schema, table, or column names of some or all of the selected objects

Once you configured the above go ahead and create the task. The task will run seamlessly and you can find data being migrated.

Ongoing Replication

The ongoing replication is where I had to spend lots of hours figuring out what all needs to be configured. But you are fortunate enough to be at the right place because I am going to tell you exactly what all steps I followed.

So the main step is to enable CDC in the SQL server database. By enabling the CDC DMS will be able to read the change logs and replicate them appropriately on the target. In order to enable CDC to execute the below scripts on SQL Server.

--enable CDC on database level
use YOUR_DB_NAME;
exec msdb.dbo.rds_cdc_enable_db ‘YOUR_DB_NAME’

In order to begin tracking the tables execute the following script.

use YOUR_DB_NAME
exec sys.sp_cdc_enable_table   
@source_schema           = N'dbo'
,  @source_name          = N'TABLE_NAME'
,  @role_name            = NULL

The above script has to be executed literally for every table which will take a long time. So I wrote a Stored Procedure that loops every table and executes this script



create or alter procedure sp_enable_disable_cdc_all_tables(@enable bit)  
as  
  
BEGIN TRY  
DECLARE @source_name varchar(400);  
declare @sql varchar(1000)  
DECLARE the_cursor CURSOR FAST_FORWARD FOR  
select name
from sys.tables
where schema_id = 1 
and name not like '%VIEW%'
and is_tracked_by_cdc = case WHEN @enable = 1 THEN 0 ELSE 1 END
OPEN the_cursor
FETCH NEXT FROM the_cursor INTO @source_name  
  
WHILE @@FETCH_STATUS = 0  
BEGIN  
if @enable = 1  
  
set @sql =' Use YOUR_DATABASE_NAME;EXEC sys.sp_cdc_enable_table  
            @source_schema = N''dbo'',@source_name = '+@source_name+'  
          , @role_name = N'''+'admin'+''''  
            
else  
set @sql =' Use YOUR_DATABASE_NAME;EXEC sys.sp_cdc_disable_table  
            @source_schema = N''dbo'',@source_name = '+@source_name+',  @capture_instance =''all'''  
exec(@sql)  
  
  
  FETCH NEXT FROM the_cursor INTO @source_name  
  
END  
  
CLOSE the_cursor  
DEALLOCATE the_cursor  
  
      
SELECT 'Successful'  
END TRY  
BEGIN CATCH  
CLOSE the_cursor  
DEALLOCATE the_cursor  
  
    SELECT   
        ERROR_NUMBER() AS ErrorNumber  
        ,ERROR_MESSAGE() AS ErrorMessage;  
END CATCH  


--to enable all tables
EXEC sp_enable_disable_cdc_all_tables 1

--to disable all tables
EXEC sp_enable_disable_cdc_all_tables 0

Once this is done you need to update the polling interval which is responsible for how long the logs need to retain.

EXEC sys.sp_cdc_change_job @job_type = 'capture' ,@pollinginterval = 86400; GO
exec sp_cdc_stop_job 'capture'
exec sp_cdc_start_job 'capture'

Note that you need to stop and start the job in order for this to get impacted.

In order to confirm if CDC has been enabled for your tables just do some insert, update or delete to your table and run the following query to see if the changes are logged.

SELECT [Current LSN],[Begin Time],[Transaction Name] FROM fn_dblog(NULL, NULL)
order by [Begin Time] desc

fn_dblog() is the function that DMS query under the hood in order to read the change logs. Now you are ready with the SQL configurations. Let’s create an ongoing replication task.

In the DMS console, click on the database migration task and click on create a task. Select the appropriate source and target database endpoint and the migration type as “Replicate data changes only”.

You can also select from which time or LSN (Log sequence number which can be retrieved from fn_dblog() function) you want to replicate changes.

Also, you need to create at least one selection rule as mentioned above and finally click on “Create task”. And that’s it. After a few minutes, you should start seeing your changes replicated to the target which is Aurora Postgres.

Validation

During the task creation process, you can find a checkbox that asks for “Turn on validation” as below.

You can also create a task that does only validation by doing the following

You can read the DMS Best practices to be more precise on the task that you create. I hope this blog saved you a lot of time.

Feel free to leave a comment down here. I would love to hear from you. Happy programming!!

1 thought on “The ultimate guide for migrating Microsoft SQL server to Aurora Postgres using AWS Database Migration Service (DMS)”

Leave a Comment