Software Development

Configure a Read Replica Database in Django

When and How to Configure a Read Replica Database in Django


Image by Petrebels on Unsplash

When and How to Configure a Read Replica Database in Django

In software engineering, one of the most crucial steps is the choice of architecture, so much so that any error in the choice may result in malfunctioning of the website that can adversely affect the business performance.

Usually, the issues related to system performance, especially for large and complex systems, are critical to maintain and diagnose since they may be related to distinct software environments. These issues may arise from multiple factors including — hardware, software, or network related issues.

The following article will discuss issues related to database performance and approaches that can be adopted to mitigate them using read-replicas. My current experience is based on the App Engine application with PostgreSQL replication deployed on the Google Cloud Platform.

Performance issues within Databases

Under most common scenarios, the system can get slaggy due to the issues related to the database performance. There are multiple factors that can contribute to this, the most common one being choosing the wrong schema design. Therefore, if not designed properly while keeping in mind the expected operations, the database will end up performing poorly. On the contrary, a schema that is well designed will not necessarily perform up to the mark as multiple other factors can affect its performance, such as the number of connections, stress or load handling capability, or the volume of data to be processed by the database. For instance, when a single database is handling a large volume of users (consider a few thousands), it will experience a large number of connections hit, and it will not necessarily have the resources to cater these hits in real-time. Therefore, in order to mitigate such cases, the concept of replica is introduced.

Database Replicas

The term “replication” with respect to software and hardware is referred to as the process of utilizing multiple copies (or replicas) of a specific resource to improve/enhance performance, service availability, and fault tolerance. With respect to replication in database systems, it’s usually considered that multiple servers are handling the same data. Multiple configurations and procedures are available to incorporate replication within databases:

  • Read or Write commands are permitted within all instances.
  • Multiple read-only instances along with a master instance (where read/write commands are permitted). Note that, in such a type of schema, data replication and synchronization can be executed both synchronous and asynchronous manners.

Dealing with Read Replica Configurations in Django

Before configuring and using read replicas in Django application, the below two points should be taken into consideration:

  • The read replicas instances should be declared as databases within the Django app.
  • A router needs to be configured to choose the corresponding read replicas.

Procedure to Declare Read Replicas Instance as Databases in Django

The method of configuration and declaration of read replicas as databases in Django involves the setting up of the Django settings file. As mentioned above, we will be setting up a default single master instance (read/write permitted server) database, while the read replicas will be declared as additional databases. For instance, considering two replicas have been configured, the settings.py file should appear as the below:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "HOST": "/cloudsql/project-id:region:master-instance-id",
        "NAME": "pg_master",
        "USER": "pg_user",
        "PASSWORD": "master_password",
    },
    "replica_1": {
        "ENGINE": "django.db.backends.postgresql", 
        "HOST": "/cloudsql/project-id:region:replica1-instance-id",
        "NAME": "pg_replica1",
        "USER": "pg_user",
        "PASSWORD": "replica1_password",
    },
    "replica_2": {
        "ENGINE": "django.db.backends.postgresql",
        "HOST": "/cloudsql/project-id:region:replica2-instance-id",
        "NAME": "pg_replica2",
        "USER": "pg_user",
        "PASSWORD": "replica2_password",
    }
}

Once the configuration is completed the two replicas should be ready for use through the Django app. We can reconfirm this through the method named “using” the QuerySet API. For instance, it’s possible to check the accessibility of the Lead model from both the master and the replicas databases using the below command:

# Here the leads are retrieved from the master instance:
Lead.objects.all()
# Here the leads are retrieved from the replica instance 1:
Lead.objects.using('replica_1').all()
# Here the leads are retrieved from the replica instance 2:
Lead.objects.using('replica_2').all()

It is also possible to set up a given database as usable while saving an object through the “save” method by passing the “using” argument. Note that while saving the object within one of the read replicas, an error will be encountered since the instances allow only read operations, not write.

Router configuration to help reach a specific read replica when required

Once the replicas instances are configured within the Django app and accessibility is ensured, the software developer designed the procedure in such a way that the read-only operation is executed on the appropriate corresponding replica. Therefore, this approach by default hinders scalability as it is more of a manual procedure of configuring the selection of the replica instance. This implies that future addition of more replicas will demand revising the code and manually configuring the routing procedure.

However, another smart approach to the manual configuration is the use of Django’ database router. In this configuration, it is possible to create a customized router which automatically selects the default instance to execute write operation, and alternatively choses a randomly selected replica to execute the read operation. Below is the custom router configuration that shows how the procedure works:

class DatabaseRouter:
    def db_for_read(self, model, **hints):
        return random.choice(['replica_1', 'replica_2'])
    def db_for_write(self, model, **hints):
        return 'default'
    def allow_relation(self, obj1, obj2, **hints):
        return True
    def allow_migrate(self, db, label, model=None, **hints):
        return True

Following the above procedure, now we need to update the “setting.py” file by setting the default database routing:

DATABASE_ROUTERS = ['app.router.DatabaseRouter']

Once the changes are made effective, all read operations within the app will be routed towards the replicas and the write operations will be directed towards the master instance.

Conclusion

To summarize our discussion, the procedure to turn on read replicas instances within Django is quite straight-forward. Furthermore, it helps to enhance the performance of the application as well as the overall system functioning. The minute changes within the configuration files and settings on top of our original code is all you need to get things going in your system’s favor.