Lecture Notes Of Day 21: MongoDB Sharding (Day 21)

Rashmi Mishra
0

 

Lecture Notes Of Day 21: MongoDB Sharding (Day 21)

Objective:

Understand the concept of sharding in MongoDB for horizontal scaling.

Outcome:

By the end of this lecture, students will be able to set up and manage a sharded MongoDB cluster.


1. Introduction to MongoDB Sharding

As data grows in size, a single server may not be sufficient to handle the workload. MongoDB provides sharding as a method for horizontal scaling to distribute data across multiple machines efficiently.

1.1 What is Sharding?

Sharding is the process of distributing data across multiple servers to:

  • Improve read and write performance
  • Increase storage capacity
  • Ensure high availability
  • Handle large-scale applications

1.2 Why Do We Need Sharding?

Without sharding, MongoDB stores all data on a single server, which can lead to:

  • Performance bottlenecks due to high traffic.
  • Increased response time for queries.
  • Storage limitations of a single machine.

By implementing sharding, MongoDB distributes the load across multiple servers, allowing it to handle more queries efficiently.


2. Components of a Sharded Cluster

A MongoDB sharded cluster consists of three main components:

1.  Shards

o    Store actual data in a distributed manner.

o    Each shard is a replica set to ensure high availability.

2.  Config Servers

o    Store metadata and configuration settings of the cluster.

o    Help the cluster track which shard contains which data.

3.  Query Routers (mongos)

o    Handle client requests and direct queries to the appropriate shard.

o    Act as an interface between the application and the sharded cluster.


3. How Sharding Works

MongoDB divides data across shards using a shard key.

3.1 Shard Key

A shard key is a field (or a combination of fields) used to determine the distribution of documents in a collection. It should:

  • Be frequently queried for better efficiency.
  • Have high cardinality (many unique values) to distribute data evenly.
  • Avoid hotspots (uneven distribution leading to performance issues).

3.2 Sharding Methods

MongoDB supports two types of sharding:

1.  Range-Based Sharding

o    Data is distributed based on ranges of values.

o    Example: If a shard key is age, documents are distributed as:

§  Shard 1: { age: 1 - 30 }

§  Shard 2: { age: 31 - 60 }

§  Shard 3: { age: 61 - 100 }

o    Disadvantage: If most queries target a specific range, some shards may be overloaded.

2.  Hash-Based Sharding

o    Data is distributed using a hash function on the shard key.

o    Ensures even distribution across all shards.

o    Example: If a shard key is _id, a hash function randomly assigns documents to different shards.


4. Setting Up a Sharded Cluster

4.1 Prerequisites

  • Install MongoDB on multiple servers or instances.
  • Ensure network connectivity between all nodes.

4.2 Steps to Configure a Sharded Cluster

1.  Start Config Servers
Config servers store metadata. Start them using:

bash
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb

2.  Start Shards (Replica Sets)

bash
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1

3.  Initiate the Replica Sets Connect to each shard and initialize the replica set:

javascript
rs.initiate()

4.  Start Query Router (mongos)

bash
mongos --configdb configReplSet/localhost:27019 --port 27017

5.  Add Shards to the Cluster

javascript
sh.addShard("shard1ReplSet/localhost:27018")
sh.addShard("shard2ReplSet/localhost:27020")

6.  Enable Sharding for a Database

javascript
sh.enableSharding("mydatabase")

7.  Choose a Shard Key and Shard a Collection

javascript
sh.shardCollection("mydatabase.users", { "user_id": "hashed" })

5. Advantages of Sharding

  • Scalability: Supports large datasets and high traffic.
  • High Availability: Data is replicated across shards.
  • Improved Query Performance: Distributes queries across multiple servers.
  • Load Balancing: Spreads workload across multiple nodes.

6. Challenges of Sharding

  • Complexity in Setup: Requires multiple configurations.
  • Shard Key Selection: Choosing an inefficient key can lead to hotspots.
  • Operational Overhead: Monitoring and maintaining multiple nodes.

7. Summary

  • Sharding is a horizontal scaling technique used in MongoDB.
  • It consists of Shards, Config Servers, and Query Routers.
  • Uses Range-Based or Hash-Based sharding.
  • Setting up a sharded cluster involves multiple configuration steps.
  • Provides scalability, high availability, and load balancing.

8. Assignment

1.  What is sharding, and why is it important in MongoDB?

2.  Explain the difference between Range-Based and Hash-Based sharding.

3.  List the steps required to configure a MongoDB sharded cluster.

4.  What are some challenges faced while implementing sharding?




Post a Comment

0Comments

Post a Comment (0)

About Me