Lecture Notes Of Day 12: Data Modeling in MongoDB

Rashmi Mishra
0

 

Lecture Notes Of Day 12: Data Modeling in MongoDB

Objective:

  • Learn how to design schema for MongoDB collections.
  • Understand the difference between relational databases and NoSQL databases in terms of data modeling.
  • Master how to structure data in MongoDB efficiently based on use cases.

Outcome:

By the end of this session, students will be able to:

  • Design and create MongoDB schemas for various applications.
  • Understand key MongoDB data modeling techniques like embedding and referencing.
  • Apply best practices in data modeling for scalability and performance.

Introduction to Data Modeling in MongoDB

In traditional relational databases like MySQL, PostgreSQL, or Oracle, the data is structured into tables with predefined columns and relationships between them (such as foreign keys). This model is based on the principles of normalization. However, MongoDB is a NoSQL database, and its approach to data modeling is quite different. MongoDB stores data in documents within collections, and these documents are typically modeled as JSON-like objects (BSON format in MongoDB).

In MongoDB, there are no fixed schemas. This means you can store different fields in different documents within the same collection, giving you flexibility. However, even though MongoDB allows flexibility, designing an efficient data model is essential for performance, scalability, and ease of use.


Key Concepts in MongoDB Data Modeling

1.  Document: A document in MongoDB is a set of key-value pairs. It is equivalent to a row in a relational database table.

o    Example: { "name": "Alice", "age": 25, "email": "alice@example.com" }

2.  Collection: A collection is a group of MongoDB documents. It is equivalent to a table in relational databases.

o    Example: A collection called users can contain multiple documents representing different users.

3.  Schema-less Nature: Unlike relational databases, MongoDB collections do not require a schema. However, you can enforce some structure using tools like Mongoose (in Node.js) or other ODM (Object Data Modeling) libraries in different programming languages.


Types of Data Models in MongoDB

MongoDB allows you to design your data model in two major ways: embedded (denormalized) models and referenced (normalized) models.

1. Embedded (Denormalized) Data Model

In this model, related data is stored within a single document. This model is often preferred when the data is frequently accessed together and doesn’t change often. It is similar to 1:1 or 1:N relationships in relational databases.

Advantages:

  • Faster read operations, as all the data needed for a query is in a single document.
  • Reduces the need for multiple queries and joins, improving performance.

Example:

Consider a blog system with posts and comments. Instead of having a separate collection for comments and using references (foreign keys), we can embed comments within the post document.

json

CopyEdit

{

    "_id": 1,

    "title": "Introduction to MongoDB",

    "author": "John Doe",

    "comments": [

        { "user": "Alice", "comment": "Great post!", "date": "2025-01-28" },

        { "user": "Bob", "comment": "Very helpful!", "date": "2025-01-29" }

    ]

}

2. Referenced (Normalized) Data Model

In this model, documents store references (usually ObjectIDs) to other documents. This is more like the relational model, where the data is separated into different collections, and links (references) are created between them.

Advantages:

  • Data is normalized, reducing redundancy.
  • Easier to maintain when data changes frequently (no need to update embedded documents).

Example:

Consider a students collection and a courses collection. A student can enroll in multiple courses, and a course can have multiple students. Instead of embedding the course details in the student document, we store references to the courses in the student document.

json

CopyEdit

// student document

{

    "_id": 1,

    "name": "Alice",

    "age": 22,

    "courses": [ObjectId("603d3b2f8f1b2c3d420b8477"), ObjectId("603d3b3c8f1b2c3d420b8478")]

}

 

// course document

{

    "_id": ObjectId("603d3b2f8f1b2c3d420b8477"),

    "course_name": "Database Systems",

    "credits": 3

}


Best Practices for MongoDB Data Modeling

1.  Understand the Access Pattern: Before designing a schema, understand how your application will query the data. If certain data is frequently accessed together, an embedded model may be more efficient.

2.  Avoid Joins: While MongoDB supports $lookup for performing joins, it’s generally more efficient to model data to avoid joins.

3.  Use References When Data Changes Frequently: If certain data, like user details, changes frequently, consider referencing it rather than embedding it.

4.  Avoid Embedding Large Data: For example, embedding large arrays or documents could lead to performance issues when the document size becomes too large. MongoDB documents have a size limit (16MB).

5.  Schema Validation: Although MongoDB doesn’t enforce schemas, you can define schema validation rules using MongoDB schema validation to ensure that data inserted into the collection follows certain guidelines.


Data Modeling Use Cases

1.  User Profile Data: For storing user profile data, an embedded data model can work well. The user’s profile information can be stored along with their posts, comments, or preferences within a single document.

2.  E-Commerce System: In an e-commerce system, you may have an embedded model for products that are frequently accessed together (e.g., product details with ratings and reviews). However, customer data, orders, and inventory could be in separate collections with references to reduce data duplication.

3.  Social Media: Social media applications often require both embedded and referenced models. For example, posts may have embedded comments, but user followers and friends could be referenced through user IDs.


Example: Designing a MongoDB Schema for a Blogging Application

Let’s consider a blogging platform that has users, posts, and comments. Here’s how you might model the schema:

1.  Users Collection (Referenced Model):

o    Fields: name, email, password, profile_picture, bio

2.  Posts Collection (Embedded Model):

o    Fields: title, content, author (reference to Users), comments (array of embedded comments)

3.  Comments Collection (Embedded Model):

o    Fields: user (reference to Users), content, date


Conclusion

Data modeling in MongoDB is an important step in building scalable, efficient applications. By understanding the different techniques like embedding and referencing, and choosing the right approach based on your use case, you can create data models that optimize your application's performance and ease of maintenance. MongoDB’s flexibility gives you the power to design data models that fit your application's needs, but careful consideration of access patterns and data structures is key to success.


Exercise

1.  Design a Schema for a Library System:

o    Create collections for books, authors, and borrowers. Determine which fields should be embedded and which should be referenced.

2.  Design a Schema for an E-commerce System:

o    Model products, categories, orders, and users using a mix of embedded and referenced data models.

3.  Real-life Use Case Design:

o    Choose a real-world system (e.g., a restaurant reservation system, an online marketplace) and design a data model using MongoDB.

Exercise Design a Schema for a Library System: Create collections for books, authors, and borrowers. Determine which fields should be embedded and which should be referenced. Design a Schema for an E-commerce System: Model products, categories, orders, and users using a mix of embedded and referenced data models. Real-life Use Case Design: Choose a real-world system (e.g., a restaurant reservation system, an online marketplace) and design a data model using MongoDB.

Exercise 1: Design a Schema for a Library System

Problem Statement:

Create collections for books, authors, and borrowers. Determine which fields should be embedded and which should be referenced.

Solution:

1.  Authors Collection (Referenced Model):

o    Each author can have multiple books, but the author data might not change frequently.

o    It’s a good practice to store the author's information in a separate collection and reference it in the books collection.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a122"),

    "name": "J.K. Rowling",

    "dob": "1965-07-31",

    "bio": "British author, best known for the Harry Potter series."

}

2.  Books Collection (Embedded and Referenced Model):

o    A book can have a lot of information embedded (e.g., title, genre), but it should reference the author to avoid redundant data.

o    We will embed an array of borrowers, which shows which users have borrowed the book.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a123"),

    "title": "Harry Potter and the Philosopher's Stone",

    "author": ObjectId("60d5f7e2bcf86f431bc8a122"),  // Reference to author

    "genre": "Fantasy",

    "publication_year": 1997,

    "borrowers": [

        { "borrower_id": ObjectId("60d5f7e2bcf86f431bc8a124"), "borrowed_on": "2025-01-01", "due_date": "2025-01-15" }

    ]

}

3.  Borrowers Collection (Referenced Model):

o    Borrowers should be stored in a separate collection, as one borrower can borrow many books.

o    We store reference IDs for the borrowed books in the borrowers collection to avoid duplication.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a124"),

    "name": "Alice",

    "email": "alice@example.com",

    "borrowed_books": [

        ObjectId("60d5f7e2bcf86f431bc8a123")  // Reference to books

    ]

}


Exercise 2: Design a Schema for an E-commerce System

Problem Statement:

Model products, categories, orders, and users using a mix of embedded and referenced data models.

Solution:

1.  Categories Collection (Referenced Model):

o    Categories will be referenced by products. This helps in managing categories without duplicating information across products.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a130"),

    "name": "Electronics",

    "description": "Devices and gadgets"

}

2.  Products Collection (Embedded and Referenced Model):

o    Product data, including price, description, etc., is stored in a collection. It references a category and embeds customer reviews (which can be updated frequently).

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a131"),

    "name": "Smartphone",

    "category": ObjectId("60d5f7e2bcf86f431bc8a130"),  // Reference to categories collection

    "price": 699,

    "description": "Latest model with high-end features.",

    "reviews": [

        { "user": "Alice", "rating": 4, "comment": "Great phone!" },

        { "user": "Bob", "rating": 5, "comment": "Excellent value for money!" }

    ]

}

3.  Orders Collection (Embedded and Referenced Model):

o    Each order will contain embedded product details and referenced customer information. We store order status and date for future queries.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a132"),

    "user": ObjectId("60d5f7e2bcf86f431bc8a135"),  // Reference to users collection

    "products": [

        { "product_id": ObjectId("60d5f7e2bcf86f431bc8a131"), "quantity": 1 }

    ],

    "order_date": "2025-01-15",

    "status": "Shipped"

}

4.  Users Collection (Referenced Model):

o    Store user details and reference their orders for easy access.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a135"),

    "name": "Alice",

    "email": "alice@example.com",

    "password": "hashed_password",

    "orders": [

        ObjectId("60d5f7e2bcf86f431bc8a132")  // Reference to orders collection

    ]

}


Exercise 3: Real-life Use Case Design: Restaurant Reservation System

Problem Statement:

Design a data model for a restaurant reservation system, which includes customers, tables, reservations, and menu items.

Solution:

1.  Customers Collection (Referenced Model):

o    Store customer details. A customer can make multiple reservations.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a140"),

    "name": "John Doe",

    "email": "johndoe@example.com",

    "phone_number": "123-456-7890"

}

2.  Tables Collection (Referenced Model):

o    The restaurant has multiple tables, and each table has a seating capacity. This data is stored in the tables collection.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a141"),

    "table_number": 1,

    "seating_capacity": 4

}

3.  Menu Collection (Embedded and Referenced Model):

o    The menu items (dishes) are embedded in the menu collection, as the menu doesn't change frequently. A dish can belong to a category like "starter," "main course," or "dessert."

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a142"),

    "name": "Spaghetti",

    "category": "Main Course",

    "price": 15.99,

    "ingredients": ["Pasta", "Tomato sauce", "Olive oil", "Cheese"]

}

4.  Reservations Collection (Embedded Model):

o    A reservation embeds details about the customer, the table, and the time of the reservation.

json

CopyEdit

{

    "_id": ObjectId("60d5f7e2bcf86f431bc8a143"),

    "customer": ObjectId("60d5f7e2bcf86f431bc8a140"),  // Reference to customer collection

    "table": ObjectId("60d5f7e2bcf86f431bc8a141"),    // Reference to table collection

    "reservation_time": "2025-02-01T18:00:00",

    "status": "Confirmed"

}


Conclusion

These exercises help demonstrate how MongoDB’s flexible data model can be adapted to real-world use cases. By using a mix of embedded and referenced models, you can design efficient and scalable schemas for applications, whether for a library system, an e-commerce platform, or a restaurant reservation system. Always tailor your schema design to the specific requirements and access patterns of your application for optimal performance.


Post a Comment

0Comments

Post a Comment (0)

About Me