Django Under Hood 09: Django’s Migration System - From Model Changes to ALTER TABLE

Django Under Hood 09: Django’s Migration System - From Model Changes to ALTER TABLE

Part 9 of the “Django Under the Hood ” series — deep dives into Django’s internals, edge cases, and the mechanics that separate production-grade applications from tutorial code.

python manage.py makemigrations
Migrations for 'myapp':
  myapp/migrations/0003_auto_20240115_1423.py
    - Add field published_at to article
    - Alter field title on article

How does Django know what changed?

You didn’t tell it. You just modified your model class. Yet Django correctly detected that you added a field and changed another. It generated the exact SQL operations needed to transform your database schema.

This isn’t magic. It’s a sophisticated diff algorithm that compares your current model definitions against a reconstructed state from previous migrations. Understanding this system prevents migration disasters: operations that lose data, migrations that can’t be reversed, and the dreaded “conflicting migrations” in team environments.

Let’s trace from makemigrations to ALTER TABLE.

The Migration File

# myapp/migrations/0003_auto_20240115_1423.py
from django.db import migrations, models

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0002_article_author'),
    ]

    operations = [
        migrations.AddField(
            model_name='article',
            name='published_at',
            field=models.DateTimeField(null=True),
        ),
        migrations.AlterField(
            model_name='article',
            name='title',
            field=models.CharField(max_length=255),  # Was 100
        ),
    ]

A migration is:

  • Dependencies: What must run before this
  • Operations: Changes to apply

The migration file is Python code, but it’s also a data structure that Django can inspect, reverse, and combine.

makemigrations: The Detection Pipeline

python manage.py makemigrations

Step 1: Load Current Models (ProjectState)

Django builds a representation of your current models:

# django/db/migrations/state.py
class ProjectState:
    def __init__(self, models=None):
        self.models = models or {}  # {(app_label, model_name): ModelState}
    
    @classmethod
    def from_apps(cls, apps):
        """Build state from current installed apps."""
        state = cls()
        for app_config in apps.get_app_configs():
            for model in app_config.get_models():
                state.add_model(ModelState.from_model(model))
        return state

ModelState captures everything about a model:

class ModelState:
    def __init__(self, app_label, name, fields, options=None, bases=None, managers=None):
        self.app_label = app_label
        self.name = name
        self.fields = dict(fields)  # {field_name: field_instance}
        self.options = options or {}
        self.bases = bases or (models.Model,)
        self.managers = managers or []

Step 2: Reconstruct State from Migrations (Historical State)

Django replays all existing migrations to build what the database should look like:

# django/db/migrations/loader.py
class MigrationLoader:
    def __init__(self, connection):
        self.migrated_apps = set()
        self.disk_migrations = {}
        self.load_disk()
    
    def load_disk(self):
        """Load migration files from disk."""
        for app_config in apps.get_app_configs():
            migrations_dir = os.path.join(app_config.path, 'migrations')
            
            for name in os.listdir(migrations_dir):
                if name.endswith('.py') and not name.startswith('_'):
                    module = import_module(f'{app_config.name}.migrations.{name[:-3]}')
                    self.disk_migrations[(app_config.label, name[:-3])] = module.Migration
    
    def project_state(self, nodes=None):
        """Replay migrations to get historical state."""
        state = ProjectState()
        
        for migration in self.get_migration_plan(nodes):
            state = migration.mutate_state(state)
        
        return state

Step 3: Diff the States (Autodetector)

The autodetector compares current state vs historical state:

# django/db/migrations/autodetector.py
class MigrationAutodetector:
    def __init__(self, from_state, to_state):
        self.from_state = from_state  # Historical (from migrations)
        self.to_state = to_state      # Current (from models)
    
    def changes(self, graph, ...):
        """Detect all changes between states."""
        self.generated_operations = {}
        
        # Order matters - detect in dependency order
        self._detect_changes()
        
        return self._build_migration_list()
    
    def _detect_changes(self):
        # 1. New models
        self.generate_created_models()
        
        # 2. Deleted models
        self.generate_deleted_models()
        
        # 3. Renamed models
        self.generate_renamed_models()
        
        # 4. Field changes
        self.generate_added_fields()
        self.generate_removed_fields()
        self.generate_altered_fields()
        self.generate_renamed_fields()
        
        # 5. Index/constraint changes
        self.generate_added_indexes()
        self.generate_removed_indexes()
        self.generate_added_constraints()
        self.generate_removed_constraints()
        
        # 6. Other changes
        self.generate_altered_options()
        self.generate_altered_managers()

The Diffing Logic

How does Django know a field was renamed vs deleted and added?

def generate_renamed_fields(self):
    """Detect field renames by comparing old/new fields."""
    for app_label, model_name in self.kept_model_keys:
        old_model = self.from_state.models[app_label, model_name]
        new_model = self.to_state.models[app_label, model_name]
        
        old_fields = set(old_model.fields.keys())
        new_fields = set(new_model.fields.keys())
        
        removed = old_fields - new_fields
        added = new_fields - old_fields
        
        # Try to match removed → added by field properties
        for old_name in removed:
            old_field = old_model.fields[old_name]
            
            for new_name in added:
                new_field = new_model.fields[new_name]
                
                if self.fields_match(old_field, new_field):
                    # Likely a rename - ask user to confirm
                    if self.questioner.ask_rename(model_name, old_name, new_name):
                        self.add_operation(
                            app_label,
                            operations.RenameField(model_name, old_name, new_name)
                        )
                        removed.discard(old_name)
                        added.discard(new_name)
                        break

This is why makemigrations asks questions:

Did you rename article.name to article.title? [y/N]

Django detects a removed field and an added field with similar properties, and asks if it’s a rename.

Operations: The Building Blocks

Each operation knows how to:

  1. Modify the project state (for future diffs)
  2. Generate forward SQL
  3. Generate reverse SQL (for rollback)
# django/db/migrations/operations/fields.py
class AddField(FieldOperation):
    def __init__(self, model_name, name, field):
        self.model_name = model_name
        self.name = name
        self.field = field
    
    def state_forwards(self, app_label, state):
        """Update ProjectState to include this field."""
        state.models[app_label, self.model_name].fields[self.name] = self.field.clone()
    
    def database_forwards(self, app_label, schema_editor, from_state, to_state):
        """Apply to database."""
        to_model = to_state.apps.get_model(app_label, self.model_name)
        field = to_model._meta.get_field(self.name)
        schema_editor.add_field(to_model, field)
    
    def database_backwards(self, app_label, schema_editor, from_state, to_state):
        """Reverse the operation."""
        from_model = from_state.apps.get_model(app_label, self.model_name)
        field = from_model._meta.get_field(self.name)
        schema_editor.remove_field(from_model, field)

Operation Types

migrate: The Execution Pipeline

~] python manage.py migrate

Step 1: Build the Migration Graph

Migrations form a directed acyclic graph (DAG):

# django/db/migrations/graph.py
class MigrationGraph:
    def __init__(self):
        self.nodes = {}  # (app, name) → Migration
        self.dependencies = {}  # (app, name) → [(app, name), ...]
    
    def add_node(self, key, migration):
        self.nodes[key] = migration
        self.dependencies[key] = []
    
    def add_dependency(self, migration, child, parent):
        self.dependencies[child].append(parent)

Visualized:

myapp/0001_initial
         ↓
myapp/0002_add_author ← auth/0001_initial
         ↓
myapp/0003_add_published_at

Step 2: Determine What to Run

# django/db/migrations/executor.py
class MigrationExecutor:
    def __init__(self, connection):
        self.connection = connection
        self.loader = MigrationLoader(connection)
        self.recorder = MigrationRecorder(connection)
    
    def migration_plan(self, targets):
        """Determine which migrations to run."""
        applied = self.recorder.applied_migrations()
        
        plan = []
        for target in targets:
            for migration in self.loader.graph.forwards_plan(target):
                if migration not in applied:
                    plan.append((migration, False))  # False = forward
        
        return plan

Step 3: Execute Each Migration

def migrate(self, targets, plan=None, fake=False):
    if plan is None:
        plan = self.migration_plan(targets)
    
    for migration, backwards in plan:
        if backwards:
            self.unapply_migration(migration, fake=fake)
        else:
            self.apply_migration(migration, fake=fake)

def apply_migration(self, migration_key, fake=False):
    migration = self.loader.get_migration(*migration_key)
    
    if fake:
        # Just record it as applied
        self.recorder.record_applied(*migration_key)
        return
    
    # Get state before this migration
    state = self.loader.project_state(migration_key, at_end=False)
    
    # Apply operations
    with self.connection.schema_editor(atomic=migration.atomic) as schema_editor:
        state = migration.apply(state, schema_editor)
    
    # Record as applied
    self.recorder.record_applied(*migration_key)

Step 4: Schema Editor Generates SQL

# django/db/backends/base/schema.py
class BaseDatabaseSchemaEditor:
    def add_field(self, model, field):
        """Generate ALTER TABLE ADD COLUMN."""
        
        # Get column definition
        definition, params = self.column_sql(model, field)
        
        # Build SQL
        sql = self.sql_create_column % {
            'table': self.quote_name(model._meta.db_table),
            'column': self.quote_name(field.column),
            'definition': definition,
        }
        
        self.execute(sql, params)
    
    sql_create_column = "ALTER TABLE %(table)s ADD COLUMN %(column)s %(definition)s"

Each database backend has its own schema editor with vendor-specific SQL:

# PostgreSQL
sql_create_column = "ALTER TABLE %(table)s ADD COLUMN %(column)s %(definition)s"

# MySQL
sql_create_column = "ALTER TABLE %(table)s ADD COLUMN %(column)s %(definition)s"

# SQLite (more complex - often requires table recreation)

The Migration Recorder

# django/db/migrations/recorder.py
class MigrationRecorder:
    """Tracks which migrations have been applied."""
    
    class Migration(models.Model):
        app = models.CharField(max_length=255)
        name = models.CharField(max_length=255)
        applied = models.DateTimeField(default=now)
        
        class Meta:
            db_table = 'django_migrations'
    
    def applied_migrations(self):
        return {(m.app, m.name) for m in self.Migration.objects.all()}
    
    def record_applied(self, app, name):
        self.Migration.objects.create(app=app, name=name)
    
    def record_unapplied(self, app, name):
        self.Migration.objects.filter(app=app, name=name).delete()

The django_migrations table is the source of truth:

SELECT * FROM django_migrations;

Squashing Migrations

~] python manage.py squashmigrations myapp 0001 0010

Squashing combines multiple migrations into one:

# django/core/management/commands/squashmigrations.py
class Command(BaseCommand):
    def handle(self, *args, **options):
        # Load migrations to squash
        migrations_to_squash = self.get_migrations_to_squash(...)
        
        # Combine all operations
        operations = []
        for migration in migrations_to_squash:
            operations.extend(migration.operations)
        
        # Optimize: remove redundant operations
        optimizer = MigrationOptimizer()
        operations = optimizer.optimize(operations, app_label)
        
        # Generate squashed migration
        new_migration = Migration(
            f'0001_squashed_0010_{...}',
            operations=operations,
            replaces=[m.name for m in migrations_to_squash],
        )

The Optimizer

# django/db/migrations/optimizer.py
class MigrationOptimizer:
    def optimize(self, operations, app_label):
        """Remove redundant operations."""
        result = list(operations)
        
        # Keep optimizing until no changes
        while True:
            new_result = self.optimize_inner(result, app_label)
            if new_result == result:
                break
            result = new_result
        
        return result
    
    def optimize_inner(self, operations, app_label):
        """Single optimization pass."""
        new_operations = []
        
        for i, operation in enumerate(operations):
            # Try to merge with previous operations
            merged = False
            
            for j in range(len(new_operations) - 1, -1, -1):
                result = operation.reduce(new_operations[j], app_label)
                
                if result is not None:
                    # Operations can be merged
                    new_operations = new_operations[:j] + list(result) + new_operations[j+1:]
                    merged = True
                    break
            
            if not merged:
                new_operations.append(operation)
        
        return new_operations

Example optimizations:

# Before optimization
AddField('title')
AlterField('title')
AlterField('title')

# After optimization
AddField('title')  # Final state only
# Before optimization
AddField('temp')
RemoveField('temp')

# After optimization
(nothing)  # Field never existed

RunPython: Custom Operations

def populate_slugs(apps, schema_editor):
    Article = apps.get_model('myapp', 'Article')
    for article in Article.objects.all():
        article.slug = slugify(article.title)
        article.save()

def reverse_slugs(apps, schema_editor):
    pass  # Can't reverse slug generation

class Migration(migrations.Migration):
    operations = [
        migrations.RunPython(populate_slugs, reverse_slugs),
    ]

Critical: Use apps.get_model(), not direct imports:

# WRONG - uses current model definition
from myapp.models import Article

def populate_slugs(apps, schema_editor):
    for article in Article.objects.all():  # Might have fields that don't exist yet!
        ...

# RIGHT - uses historical model state
def populate_slugs(apps, schema_editor):
    Article = apps.get_model('myapp', 'Article')
    for article in Article.objects.all():  # Has only fields from this migration point
        ...

Common Issues

Issue 1: Conflicting Migrations

CommandError: Conflicting migrations detected; multiple leaf nodes in the migration graph

Two developers created migrations from the same parent:

0002_add_author
         ↓
0003_feature_a (developer 1)
0003_feature_b (developer 2)  ← Conflict!

✅ Fix: Merge migrations

~] python manage.py makemigrations --merge

Creates a merge migration:

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_feature_a'),
        ('myapp', '0003_feature_b'),
    ]
    operations = []  # Just merges the graph

Issue 2: Cannot Add Non-Nullable Field

django.db.utils.IntegrityError: column "published_at" contains null values

Cause: Adding required field to table with existing data.

✅ Fix: Add with default, or allow null then backfill:

# Option 1: Provide default
migrations.AddField(
    model_name='article',
    name='published_at',
    field=models.DateTimeField(default=timezone.now),
)

# Option 2: Allow null, backfill, then make required
migrations.AddField(
    model_name='article',
    name='published_at',
    field=models.DateTimeField(null=True),
),
migrations.RunPython(backfill_published_at),
migrations.AlterField(
    model_name='article',
    name='published_at',
    field=models.DateTimeField(),
),

Issue 3: Migration Timeout on Large Tables

Adding column to a table with millions of rows can lock the table.

✅ Fix: Use AddField with db_default (Django 5.0+):

migrations.AddField(
    model_name='article',
    name='view_count',
    field=models.IntegerField(db_default=0),  # Database-level default
)

Or use database-specific concurrent operations:

# PostgreSQL: CREATE INDEX CONCURRENTLY
migrations.AddIndex(
    model_name='article',
    index=models.Index(fields=['title']),
)
# Then manually add CONCURRENTLY in RunSQL

Issue 4: Circular Dependencies

# app1/models.py
class Author(models.Model):
    favorite_article = models.ForeignKey('app2.Article', ...)

# app2/models.py
class Article(models.Model):
    author = models.ForeignKey('app1.Author', ...)

✅ Fix: Use string references and ensure migrations are ordered correctly, or break the cycle with nullable fields.

Issue 5: State vs Database Mismatch

~] python manage.py migrate --fake myapp 0003

Marks migration as applied without running it. Dangerous if database doesn’t match!

✅ Fix: Use --fake-initial for initial migrations only, or manually sync state:

# See current state
~] python manage.py showmigrations

# Reset app migrations (nuclear option)
~] python manage.py migrate myapp zero --fake
~] python manage.py migrate myapp --fake

Best Practices

1. One logical change per migration

# Good: separate migrations
# 0003_add_published_at.py
# 0004_add_view_count.py

# Bad: unrelated changes together
# 0003_add_published_at_and_view_count.py

2. Always test rollback

~] python manage.py migrate myapp 0002  # Rollback
~] python manage.py migrate myapp       # Forward again

3. Use --plan before migrating production

~] python manage.py migrate --plan

4. Squash periodically

Keep migration count manageable for new developers.

5. Never edit applied migrations

If it’s in production, create a new migration to fix it.

What’s Next

This was the migration system — from model diff to ALTER TABLE.

Next and final in the series: Test Client and Request Factory Mechanics — how Django simulates HTTP requests, the test client internals, and why some tests don’t behave like real requests.

Series: Django Under the Hood

  1. Request Lifecycle
  2. ORM Query Compiler
  3. Connection Management
  4. Signal Dispatch
  5. Template Engine
  6. Form Pipeline
  7. Authentication Chain
  8. Static Files
  9. Migration System Deep Dive ← You are here
  10. Test Client

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus