Lecture Notes Of Day 13:
MongoDB Relationships (Embedding vs. Referencing)
Objective:
To understand how to model
relationships in MongoDB. By the end of this session, students should be able
to decide when to embed documents and when to reference them, considering
factors like data size, query performance, and data consistency.
Outcome:
- Students
will be able to model relationships using embedding and referencing
techniques in MongoDB.
- Students
will be able to identify which approach is more suitable based on
different use cases.
Introduction
to Relationships in MongoDB:
In MongoDB, the way data is
stored and structured plays a key role in determining the efficiency of your
database queries. Unlike traditional relational databases, MongoDB is a NoSQL
database, and it uses different strategies to manage relationships between
data.
MongoDB supports two main ways of
representing relationships between data:
1. Embedding
2. Referencing
Both methods have their
advantages and disadvantages, depending on the data and the type of query that
is needed.
1.
Embedding:
Embedding refers
to the practice of storing related data within the same document. This is
useful when the related data is tightly coupled and accessed together
frequently.
Use Case
for Embedding:
- When
to Embed:
- When
you have a small amount of related data that will not change frequently.
- When
related data will always be retrieved together (i.e., low update frequency).
- For
one-to-many relationships where the "many" part is small and
not likely to grow significantly.
Advantages
of Embedding:
- Performance:
Faster reads since all related data is stored in a single document and can
be retrieved in one query.
- Atomic
Operations: Updates to a document are atomic, meaning you
can update all parts of the document in one operation.
- Simplified
Schema: Embedding allows for simpler schema designs,
as related data is kept together.
Disadvantages
of Embedding:
- Data
Duplication: If the same data is embedded in multiple
documents, there is redundancy, which can lead to storage inefficiencies.
- Document
Size Limitation: MongoDB has a 16MB document size limit.
Embedding too much data can quickly lead to large documents that exceed
this limit.
- Hard
to Update: If the embedded data needs to be updated
independently, you may need to modify many documents.
Example:
A blog post and its comments
could be stored in a single document, as the comments are closely related to
the blog post and are typically retrieved together.
json
CopyEdit
{
"_id": 1,
"title": "MongoDB for
Beginners",
"content": "This is a blog
post about MongoDB.",
"comments": [
{ "author": "Alice", "text":
"Great post!" },
{ "author": "Bob", "text":
"Very helpful, thanks!" }
]
}
2.
Referencing:
Referencing involves
storing data in separate documents and linking them together using references
(such as object IDs). This is typically used for more complex relationships or
when the related data changes frequently.
Use Case
for Referencing:
- When
to Reference:
- When
data is large or changes frequently.
- When
you have many-to-many or one-to-many relationships.
- When
you want to avoid data duplication, such as when different documents need
to reference the same piece of data.
Advantages
of Referencing:
- Avoid
Redundancy: No duplication of data since you store
references to the related data rather than embedding it.
- Better
for Larger Data: Suitable for large datasets that might exceed
document size limits.
- Flexibility: If
the referenced data changes, you only need to update it in one place.
Disadvantages
of Referencing:
- Multiple
Queries Required: To retrieve related data, multiple queries
are needed to first get the parent document and then retrieve the
referenced documents.
- Complexity: Queries
and updates become more complex since you need to manage the relationships
and use techniques like $lookup to join data from different collections.
Example:
Consider a system where authors
and books are stored in separate collections. A book document might reference
the author by using the author's ID.
Authors Collection:
json
CopyEdit
{
"_id": 1,
"name": "John Doe"
}
Books Collection:
json
CopyEdit
{
"_id": 100,
"title": "Learn MongoDB",
"author_id": 1
}
To retrieve a book along with its
author's details, you would perform two queries: one for the book and one for
the author.
Embedding
vs. Referencing:
When to
Embed:
- The
data is closely related and will always be queried together.
- The
embedded data is small, and the size limit is not an issue.
- You
need fast read performance for related data.
When to
Reference:
- The
data is large or frequently updated.
- The
related data might change independently.
- You
need to minimize duplication and avoid redundancy.
- You
are working with many-to-many relationships.
Examples
and Use Cases:
Example
1: Embedding (Blog Post with Comments)
- Scenario: A
blog post is usually read with all its comments together.
- Recommended: Embed
comments within the blog post document.
Example
2: Referencing (Users and Orders)
- Scenario: A
user can place multiple orders, and the orders may contain different
items. The orders might be stored in a separate collection.
- Recommended: Use
referencing to link the user to each order (e.g., user_id in the order
document).
Example
3: Embedding vs. Referencing in E-commerce
- Embedding:
Store product information (e.g., price, description, images) inside an
order document if the product info does not change often.
- Referencing:
Store product information in a separate collection and reference it in the
order document when the product information is frequently updated.
Best
Practices:
1. Keep the
Data Small for Embedding: Embed data only if it is small and will be
frequently accessed together.
2. Use
References for Larger Data: For data that changes often, use references to
avoid duplication and ensure consistency.
3. Monitor
Query Performance: Embedding data may be faster for reads, but
referencing data can scale better as the database grows.
Conclusion:
Deciding whether to embed or
reference data in MongoDB depends on your application's needs. Embedding is often
simpler and faster for read-heavy workloads with small data, while referencing
provides more flexibility for complex relationships and larger datasets. Always
consider the size of the data, query patterns, and the need for data
consistency when choosing your approach.
