Lecture Notes Of Day 12:
Data Modeling in MongoDB
Objective:
- Learn
how to design schema for MongoDB collections.
- Understand
the difference between relational databases and NoSQL databases in terms
of data modeling.
- Master
how to structure data in MongoDB efficiently based on use cases.
Outcome:
By the end of this session,
students will be able to:
- Design
and create MongoDB schemas for various applications.
- Understand
key MongoDB data modeling techniques like embedding and referencing.
- Apply
best practices in data modeling for scalability and performance.
Introduction
to Data Modeling in MongoDB
In traditional relational
databases like MySQL, PostgreSQL, or Oracle, the data is structured into tables
with predefined columns and relationships between them (such as foreign keys).
This model is based on the principles of normalization. However, MongoDB is a
NoSQL database, and its approach to data modeling is quite different. MongoDB
stores data in documents within collections, and these documents are
typically modeled as JSON-like objects (BSON format in MongoDB).
In MongoDB, there are no fixed
schemas. This means you can store different fields in different documents
within the same collection, giving you flexibility. However, even though
MongoDB allows flexibility, designing an efficient data model is essential for
performance, scalability, and ease of use.
Key
Concepts in MongoDB Data Modeling
1. Document: A
document in MongoDB is a set of key-value pairs. It is equivalent to a row in a
relational database table.
o
Example: { "name": "Alice",
"age": 25, "email": "alice@example.com" }
2. Collection: A
collection is a group of MongoDB documents. It is equivalent to a table in
relational databases.
o
Example: A collection called users can contain
multiple documents representing different users.
3. Schema-less
Nature: Unlike relational databases, MongoDB collections do not require a
schema. However, you can enforce some structure using tools like Mongoose (in
Node.js) or other ODM (Object Data Modeling) libraries in different programming
languages.
Types of
Data Models in MongoDB
MongoDB allows you to design your
data model in two major ways: embedded (denormalized) models and referenced
(normalized) models.
1.
Embedded (Denormalized) Data Model
In this model, related data is
stored within a single document. This model is often preferred when the data is
frequently accessed together and doesn’t change often. It is similar to 1:1
or 1:N relationships in relational databases.
Advantages:
- Faster
read operations, as all the data needed for a query is in a single
document.
- Reduces
the need for multiple queries and joins, improving performance.
Example:
Consider a blog system with posts
and comments. Instead of having a separate collection for comments and using
references (foreign keys), we can embed comments within the post document.
json
CopyEdit
{
"_id": 1,
"title": "Introduction to
MongoDB",
"author": "John Doe",
"comments": [
{ "user": "Alice", "comment":
"Great post!", "date": "2025-01-28" },
{ "user": "Bob", "comment":
"Very helpful!", "date": "2025-01-29" }
]
}
2.
Referenced (Normalized) Data Model
In this model, documents store
references (usually ObjectIDs) to other documents. This is more like the
relational model, where the data is separated into different collections, and
links (references) are created between them.
Advantages:
- Data
is normalized, reducing redundancy.
- Easier
to maintain when data changes frequently (no need to update embedded
documents).
Example:
Consider a students collection
and a courses collection. A student can enroll in multiple courses, and a
course can have multiple students. Instead of embedding the course details in
the student document, we store references to the courses in the student
document.
json
CopyEdit
//
student document
{
"_id": 1,
"name": "Alice",
"age": 22,
"courses": [ObjectId("603d3b2f8f1b2c3d420b8477"),
ObjectId("603d3b3c8f1b2c3d420b8478")]
}
// course
document
{
"_id": ObjectId("603d3b2f8f1b2c3d420b8477"),
"course_name": "Database
Systems",
"credits": 3
}
Best
Practices for MongoDB Data Modeling
1. Understand
the Access Pattern: Before designing a schema, understand how your
application will query the data. If certain data is frequently accessed
together, an embedded model may be more efficient.
2. Avoid
Joins: While MongoDB supports $lookup for performing joins, it’s generally
more efficient to model data to avoid joins.
3. Use
References When Data Changes Frequently: If certain data, like user
details, changes frequently, consider referencing it rather than embedding it.
4. Avoid
Embedding Large Data: For example, embedding large arrays or documents
could lead to performance issues when the document size becomes too large.
MongoDB documents have a size limit (16MB).
5. Schema
Validation: Although MongoDB doesn’t enforce schemas, you can define schema
validation rules using MongoDB schema validation to ensure that data
inserted into the collection follows certain guidelines.
Data
Modeling Use Cases
1. User
Profile Data: For storing user profile data, an embedded data
model can work well. The user’s profile information can be stored along with
their posts, comments, or preferences within a single document.
2. E-Commerce
System: In an e-commerce system, you may have an embedded model for products that
are frequently accessed together (e.g., product details with ratings and
reviews). However, customer data, orders, and inventory could be in separate
collections with references to reduce data duplication.
3. Social
Media: Social media applications often require both embedded and referenced
models. For example, posts may have embedded comments, but user followers and
friends could be referenced through user IDs.
Example:
Designing a MongoDB Schema for a Blogging Application
Let’s consider a blogging
platform that has users, posts, and comments. Here’s how you might model the
schema:
1. Users
Collection (Referenced Model):
o
Fields: name, email, password, profile_picture, bio
2. Posts
Collection (Embedded Model):
o
Fields: title, content, author (reference to Users),
comments (array of embedded comments)
3. Comments
Collection (Embedded Model):
o
Fields: user (reference to Users), content, date
Conclusion
Data modeling in MongoDB is an
important step in building scalable, efficient applications. By understanding
the different techniques like embedding and referencing, and choosing the right
approach based on your use case, you can create data models that optimize your
application's performance and ease of maintenance. MongoDB’s flexibility gives
you the power to design data models that fit your application's needs, but
careful consideration of access patterns and data structures is key to success.
Exercise
1. Design a Schema
for a Library System:
o
Create collections for books, authors, and borrowers.
Determine which fields should be embedded and which should be referenced.
2. Design a
Schema for an E-commerce System:
o
Model products, categories, orders, and users using
a mix of embedded and referenced data models.
3. Real-life
Use Case Design:
o
Choose a real-world system (e.g., a restaurant
reservation system, an online marketplace) and design a data model using
MongoDB.
Exercise Design a Schema for a Library System: Create collections for books, authors, and borrowers. Determine which fields should be embedded and which should be referenced. Design a Schema for an E-commerce System: Model products, categories, orders, and users using a mix of embedded and referenced data models. Real-life Use Case Design: Choose a real-world system (e.g., a restaurant reservation system, an online marketplace) and design a data model using MongoDB.
Exercise 1: Design a Schema for a Library System
Problem
Statement:
Create collections for books, authors,
and borrowers. Determine which fields should be embedded and which should be
referenced.
Solution:
1. Authors
Collection (Referenced Model):
o
Each author can have multiple books, but the author
data might not change frequently.
o
It’s a good practice to store the author's
information in a separate collection and reference it in the books collection.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a122"),
"name": "J.K. Rowling",
"dob": "1965-07-31",
"bio": "British author, best
known for the Harry Potter series."
}
2. Books
Collection (Embedded and Referenced Model):
o
A book can have a lot of information embedded
(e.g., title, genre), but it should reference the author to avoid redundant
data.
o
We will embed an array of borrowers, which shows
which users have borrowed the book.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a123"),
"title": "Harry Potter and
the Philosopher's Stone",
"author": ObjectId("60d5f7e2bcf86f431bc8a122"), // Reference to author
"genre": "Fantasy",
"publication_year": 1997,
"borrowers": [
{ "borrower_id": ObjectId("60d5f7e2bcf86f431bc8a124"),
"borrowed_on": "2025-01-01", "due_date": "2025-01-15"
}
]
}
3. Borrowers
Collection (Referenced Model):
o
Borrowers should be stored in a separate
collection, as one borrower can borrow many books.
o
We store reference IDs for the borrowed books in
the borrowers collection to avoid duplication.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a124"),
"name": "Alice",
"email": "alice@example.com",
"borrowed_books": [
ObjectId("60d5f7e2bcf86f431bc8a123") // Reference to books
]
}
Exercise
2: Design a Schema for an E-commerce System
Problem
Statement:
Model products, categories,
orders, and users using a mix of embedded and referenced data models.
Solution:
1. Categories
Collection (Referenced Model):
o
Categories will be referenced by products. This
helps in managing categories without duplicating information across products.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a130"),
"name": "Electronics",
"description": "Devices and
gadgets"
}
2. Products
Collection (Embedded and Referenced Model):
o
Product data, including price, description, etc.,
is stored in a collection. It references a category and embeds customer reviews
(which can be updated frequently).
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a131"),
"name": "Smartphone",
"category": ObjectId("60d5f7e2bcf86f431bc8a130"), // Reference to categories collection
"price": 699,
"description": "Latest model
with high-end features.",
"reviews": [
{ "user": "Alice", "rating":
4, "comment": "Great phone!" },
{ "user": "Bob", "rating":
5, "comment": "Excellent value for money!" }
]
}
3. Orders
Collection (Embedded and Referenced Model):
o
Each order will contain embedded product details
and referenced customer information. We store order status and date for future
queries.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a132"),
"user": ObjectId("60d5f7e2bcf86f431bc8a135"), // Reference to users collection
"products": [
{ "product_id": ObjectId("60d5f7e2bcf86f431bc8a131"),
"quantity": 1 }
],
"order_date": "2025-01-15",
"status": "Shipped"
}
4. Users
Collection (Referenced Model):
o
Store user details and reference their orders for
easy access.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a135"),
"name": "Alice",
"email": "alice@example.com",
"password": "hashed_password",
"orders": [
ObjectId("60d5f7e2bcf86f431bc8a132") // Reference to orders collection
]
}
Exercise
3: Real-life Use Case Design: Restaurant Reservation System
Problem
Statement:
Design a data model for a
restaurant reservation system, which includes customers, tables, reservations,
and menu items.
Solution:
1. Customers
Collection (Referenced Model):
o
Store customer details. A customer can make
multiple reservations.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a140"),
"name": "John Doe",
"email": "johndoe@example.com",
"phone_number": "123-456-7890"
}
2. Tables
Collection (Referenced Model):
o
The restaurant has multiple tables, and each table
has a seating capacity. This data is stored in the tables collection.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a141"),
"table_number": 1,
"seating_capacity": 4
}
3. Menu
Collection (Embedded and Referenced Model):
o
The menu items (dishes) are embedded in the menu
collection, as the menu doesn't change frequently. A dish can belong to a category
like "starter," "main course," or "dessert."
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a142"),
"name": "Spaghetti",
"category": "Main
Course",
"price": 15.99,
"ingredients": ["Pasta",
"Tomato sauce", "Olive oil", "Cheese"]
}
4. Reservations
Collection (Embedded Model):
o
A reservation embeds details about the customer,
the table, and the time of the reservation.
json
CopyEdit
{
"_id": ObjectId("60d5f7e2bcf86f431bc8a143"),
"customer": ObjectId("60d5f7e2bcf86f431bc8a140"), // Reference to customer collection
"table": ObjectId("60d5f7e2bcf86f431bc8a141"), // Reference to table collection
"reservation_time": "2025-02-01T18:00:00",
"status": "Confirmed"
}
Conclusion
These exercises help demonstrate
how MongoDB’s flexible data model can be adapted to real-world use cases. By
using a mix of embedded and referenced models, you can design efficient and
scalable schemas for applications, whether for a library system, an e-commerce
platform, or a restaurant reservation system. Always tailor your schema design
to the specific requirements and access patterns of your application for
optimal performance.
