The Issue:
Sometime back, there was an issue with a client’s application that used MongoDB for data storage. Due to high volume of data handled, they were facing performance issues. Generally with such high volume of data, the CPU gets utilised heavily due to high I/O operations. As a result, it will affect the performance of the app. While the obvious solution in such a situation is to scale the database, vertical scaling was already tried from the client’s end. But the issue persisted nevertheless. That is when we stepped in to scale the database horizontally and so I had to setup a MongoDB Sharding in order to resolve this.
In this article, I am going to explain how I went about setting up MongoDB sharding in an Ubuntu machine.
Ok, first, What is this Sharding?
Sharding, in simple terms, is the process of writing data across different servers. With sharding, we can add more servers to support data growth, and the demands of read and write operations.
Next, the basic definition of MongoDB
And then, MongoDB Sharding
Config Servers
– Config servers store the metadata of the clusters dataset to the shards. A production sharding must have exactly three config servers. These will organize the data for Query router.
Shards (Replica set)
– Shards are responsible to store the actual data. Each shard is composed of multiple replica set, and we can define a replica set as containing one primary and one or more secondary set. As a result, shards provide high availability through replica set. The primary set alone can do write operations and secondary can do only Read operations. Furthermore, when the primary replica set goes down then one of the secondary replica set will act as master.
Query Routes (mongos)
– Application and direct mongo operations can be done through Query routers. For each request it will communicate with shards then returns the results to clients. Also, the shard setup may have multiple Query router to split the request load.
Setup Prerequisites
We require the following servers for Mongodb Sharding setup:
Query server – Server A
Config server & Shards / Replica Set – Server B
Config server & Shards / Replica Set – Server C
Config server & Shards / Replica Set – Server D
STEP 1 – Install mongoDB
$ apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
$ echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org
$ echo "mongodb-org hold" | sudo dpkg --set-selections $ echo "mongodb-org-server hold" | sudo dpkg --set-selections $ echo "mongodb-org-shell hold" | sudo dpkg --set-selections $ echo "mongodb-org-mongos hold" | sudo dpkg --set-selections $ echo "mongodb-org-tools hold" | sudo dpkg --set-selections
STEP 2 – MongoDB Sharding Config Server Setup
In order to setup config server run the following steps in the Servers B, C and D
$ mkdir -p /var/mongodb/mongo-metadata
$ mongod --configsvr --dbpath /var/mongodb/mongo-metadata --port 27019
STEP 3 – Mongo Shards Replica set setup
In order to setup shards run the following steps in the Servers B, C and D
$ mkdir -p /var/mongodb/store/rs1 $ mkdir -p /var/mongodb/store/rs2 $ mkdir -p /var/mongodb/store/rs3
$ mongod --shardsvr --replSet rs1 --dbpath /var/mongodb/store/rs1 --port 27000 $ mongod --shardsvr --replSet rs2 --dbpath /var/mongodb/store/rs2 --port 27001 $ mongod --shardsvr --replSet rs1 --dbpath /var/mongodb/store/rs3 --port 27001
$ mongo --port 27000 mongo> rs.initiate() mongo> rs.add("shard.host.com:27001") mongo> rs.add("shard.host.com:27002")
STEP 4 – MongoDB Sharding Query Routers setup
Before continuing this step check if all three config servers are running and listening connections. Then, we need to stop Mongodb process.
In order to stop Mongodb service:
$ sudo service mongodb stop
In order to start the query router service:
# Using this same (following) command you can start multiple query servers
$ mongos --configdb config0.host.com:27019,config1.host.com:27019,config2.host.com:27019
STEP 5 – Add our shards servers in Query Router
Now that we have our config servers, query routers and replica set server configured, we can add the shard servers to our query routers.
In order to add shard servers to the mongo cluster run the below steps in Server A
$ mongo --host query0.example.com --port 27017 mongo> sh.addShard( "rep_set_name/rep_set_member:27017" )
$ mongo --port 27000 mongo> rs.status()
Now, select the Primary members name.
STEP 6 – Enable Sharding for a Database and Collection
$ mongo --host query0.example.com --port 27017
Stop the sharding balancer
Before enabling sharding we have to stop the balancer service from mongo console.
mongo> sh.stopBalancer()
Enable sharding to the database
mongo> use current_DB mongo> sh.enableSharding("#{current_DB}")
Enable sharding to the collections
mongo> use current_DB mongo> sh.shardCollection( "current_DB.test_collection", { "_id" : "shard key" } )
Specifically notable is that the shard key that you choose here for sharding the collection will have impact on the performance.
Conclusion
By the end of this blog, you should also be able to implement your own MongoDB sharding configuration and thus set it up on an Ubuntu machine.
Sharding definitely helped boost the application’s performance and high availability of the database. Not only that, but it also allows the client to scale up their database horizontally by adding as many shards according to the increase in volume of data.
Follow Agira Tech Blog for more interesting blogs and queries.