Lecture Notes Of Day 13: MongoDB Relationships (Embedding vs. Referencing)

Rashmi Mishra
0

 

Lecture Notes Of Day 13: MongoDB Relationships (Embedding vs. Referencing)

Objective:

To understand how to model relationships in MongoDB. By the end of this session, students should be able to decide when to embed documents and when to reference them, considering factors like data size, query performance, and data consistency.

Outcome:

  • Students will be able to model relationships using embedding and referencing techniques in MongoDB.
  • Students will be able to identify which approach is more suitable based on different use cases.

Introduction to Relationships in MongoDB:

In MongoDB, the way data is stored and structured plays a key role in determining the efficiency of your database queries. Unlike traditional relational databases, MongoDB is a NoSQL database, and it uses different strategies to manage relationships between data.

MongoDB supports two main ways of representing relationships between data:

1.   Embedding

2.   Referencing

Both methods have their advantages and disadvantages, depending on the data and the type of query that is needed.


1. Embedding:

Embedding refers to the practice of storing related data within the same document. This is useful when the related data is tightly coupled and accessed together frequently.

Use Case for Embedding:

  • When to Embed:
    • When you have a small amount of related data that will not change frequently.
    • When related data will always be retrieved together (i.e., low update frequency).
    • For one-to-many relationships where the "many" part is small and not likely to grow significantly.

Advantages of Embedding:

  • Performance: Faster reads since all related data is stored in a single document and can be retrieved in one query.
  • Atomic Operations: Updates to a document are atomic, meaning you can update all parts of the document in one operation.
  • Simplified Schema: Embedding allows for simpler schema designs, as related data is kept together.

Disadvantages of Embedding:

  • Data Duplication: If the same data is embedded in multiple documents, there is redundancy, which can lead to storage inefficiencies.
  • Document Size Limitation: MongoDB has a 16MB document size limit. Embedding too much data can quickly lead to large documents that exceed this limit.
  • Hard to Update: If the embedded data needs to be updated independently, you may need to modify many documents.

Example:

A blog post and its comments could be stored in a single document, as the comments are closely related to the blog post and are typically retrieved together.

json

CopyEdit

{

  "_id": 1,

  "title": "MongoDB for Beginners",

  "content": "This is a blog post about MongoDB.",

  "comments": [

    { "author": "Alice", "text": "Great post!" },

    { "author": "Bob", "text": "Very helpful, thanks!" }

  ]

}


2. Referencing:

Referencing involves storing data in separate documents and linking them together using references (such as object IDs). This is typically used for more complex relationships or when the related data changes frequently.

Use Case for Referencing:

  • When to Reference:
    • When data is large or changes frequently.
    • When you have many-to-many or one-to-many relationships.
    • When you want to avoid data duplication, such as when different documents need to reference the same piece of data.

Advantages of Referencing:

  • Avoid Redundancy: No duplication of data since you store references to the related data rather than embedding it.
  • Better for Larger Data: Suitable for large datasets that might exceed document size limits.
  • Flexibility: If the referenced data changes, you only need to update it in one place.

Disadvantages of Referencing:

  • Multiple Queries Required: To retrieve related data, multiple queries are needed to first get the parent document and then retrieve the referenced documents.
  • Complexity: Queries and updates become more complex since you need to manage the relationships and use techniques like $lookup to join data from different collections.

Example:

Consider a system where authors and books are stored in separate collections. A book document might reference the author by using the author's ID.

Authors Collection:

json

CopyEdit

{

  "_id": 1,

  "name": "John Doe"

}

Books Collection:

json

CopyEdit

{

  "_id": 100,

  "title": "Learn MongoDB",

  "author_id": 1

}

To retrieve a book along with its author's details, you would perform two queries: one for the book and one for the author.


Embedding vs. Referencing:

When to Embed:

  • The data is closely related and will always be queried together.
  • The embedded data is small, and the size limit is not an issue.
  • You need fast read performance for related data.

When to Reference:

  • The data is large or frequently updated.
  • The related data might change independently.
  • You need to minimize duplication and avoid redundancy.
  • You are working with many-to-many relationships.

Examples and Use Cases:

Example 1: Embedding (Blog Post with Comments)

  • Scenario: A blog post is usually read with all its comments together.
  • Recommended: Embed comments within the blog post document.

Example 2: Referencing (Users and Orders)

  • Scenario: A user can place multiple orders, and the orders may contain different items. The orders might be stored in a separate collection.
  • Recommended: Use referencing to link the user to each order (e.g., user_id in the order document).

Example 3: Embedding vs. Referencing in E-commerce

  • Embedding: Store product information (e.g., price, description, images) inside an order document if the product info does not change often.
  • Referencing: Store product information in a separate collection and reference it in the order document when the product information is frequently updated.

Best Practices:

1.   Keep the Data Small for Embedding: Embed data only if it is small and will be frequently accessed together.

2.   Use References for Larger Data: For data that changes often, use references to avoid duplication and ensure consistency.

3.   Monitor Query Performance: Embedding data may be faster for reads, but referencing data can scale better as the database grows.


Conclusion:

Deciding whether to embed or reference data in MongoDB depends on your application's needs. Embedding is often simpler and faster for read-heavy workloads with small data, while referencing provides more flexibility for complex relationships and larger datasets. Always consider the size of the data, query patterns, and the need for data consistency when choosing your approach.


Tags

Post a Comment

0Comments

Post a Comment (0)

About Me