Django data migrations
July 2nd, 2008I recently moved a decently sized database (400,000 records, 40 tables) from MSSQL to a Debian server running MySQL. I’m pretty meticulous when it comes to saving old data, so I wanted to keep the integrity exactly right. In general, I exported my old data and loaded it into a temporary project and used django to create models on my old data. Then I wrote a python script using Django’s DB library to loop through this old data and get it into my new format. Super easy!
1) Exporting the data from the old database
I happened to use the MySQL Migration Toolkit, a rather wonderful tool. It wasn’t too difficult to use–I ended up with a Creates.sql and an Inserts.sql which loaded my old data perfectly. This is the only distribution specific (MySQL in this case) step I used, so if you’re using Postgres or something, tackle this in some other way and move on.
One important step: if you have any table names which will conflict with your current django project, rename them before exporting.
2) Prepping Django Models
Load your data into a fresh temporary table, ‘project_temp’ for now. Then create a new django project and hook it up to this temporary table. Don’t syncdb or anything, simply run django manage.py inspectdb > models.py
Then create a new app ‘migration’ in your project (the real one, not this temporary one) and add it to your INSTALLED_APPS. Throw this new models.py into the migration app and then in SQL copy all of the tables from your temporary database to your django project. I didn’t have any conflicting table names but you might. If so, you should have renamed them at step 1.
If your old project has a user table, add a new field to the model called ‘live’, which is a null=True,blank=True FK to your current Django User model. Don’t forget to manually add this field to your old data’s user table.
3) Migration app
In your migration app, create migrate.py–this is the script you will use for looping through the old models and loading them into the new. You may want to break it up into several scripts, as it will probably get somewhat long. Open up that file and add the following to the top:
import sys
sys.path.append(’/path/to/your/projects/’)
import os
os.environ[’DJANGO_SETTINGS_MODULE’] = ‘myproject.settings’
from myproject.migration.models import *
4) Migration scripts
Now that your migration app is setup with all the files necessary, you’re ready to start writing your migration script. The benefit of using Django rather than SQL to prepare is that you can be absolutely sure that your old data is valid against your new models.
In general, I looped through my users, saving them using get_or_create, and then updated my old user table’s ‘link’ FK to point to the newly created user. Then I looped through the rest of my models. Whenever I encountered a reference to a user’s former id, it was easy to hook up because of the link between the new and old user models. I used get_in_create in every case. This makes it so you can stop and restart your script without adding duplicate info to your database. And if your script gets interrupted, any data that didn’t get added will on the next run.
Keep running and rerunning your migrate script, fixing script logic and cleaning your old data until you’ve got a complete database.
