Lecture Notes Of Day 10:
Aggregation Framework – Part 1 (Basics)
Objective:
The objective of this session is
to introduce the aggregation framework in MongoDB, which is a powerful tool for
transforming and combining data from multiple documents. By the end of the
session, students will be able to use the aggregate() method to perform basic
aggregation operations like sum, average, and grouping.
Outcome:
Students will be able to:
- Understand
what aggregation is and why it's useful in MongoDB.
- Use
the aggregate() method to perform basic aggregations such as calculating
the sum, average, and grouping data.
What is
Aggregation?
Aggregation in MongoDB refers to
the process of transforming data from multiple documents into a summarized
result. This is similar to SQL’s GROUP BY, SUM(), AVG(), and other aggregate
functions. The aggregation framework in MongoDB allows us to perform complex
data processing tasks such as filtering, grouping, sorting, and performing
calculations.
Unlike simple queries that fetch
documents, aggregation operations can manipulate data in ways that help answer
complex questions.
Why Use
Aggregation?
Aggregation is especially useful
for tasks such as:
- Summarizing
data (e.g., total sales, average price).
- Grouping
data based on common fields (e.g., group customers by region).
- Calculating
new fields or values based on the data.
MongoDB provides the aggregate()
method, which allows us to perform these tasks.
Basic
Structure of the aggregate() Method
The aggregate() method processes
data in stages. Each stage performs a specific operation and passes the result
to the next stage. These stages are defined using an array of pipeline
operators.
Syntax:
javascript
CopyEdit
db.collection.aggregate([
{ stage1 },
{ stage2 },
{ stage3 },
...
])
- db.collection
is the MongoDB collection on which the aggregation is being performed.
- Each
stage is represented by an object, which can contain various operators to
define what to do with the data.
Stages in
the Aggregation Pipeline:
1. $match –
Filters documents based on a specified condition (similar to the find()
method).
2. $group – Groups
documents together based on a specified field and applies aggregation operators
(like sum, avg, etc.).
3. $project –
Reshapes the documents by adding or removing fields.
4. $sort – Sorts
documents in ascending or descending order.
5. $limit – Limits
the number of documents returned.
6. $skip – Skips
a specified number of documents.
Each stage can work together to
perform more complex operations.
Aggregation
Examples:
1. Sum:
To calculate the total sum of a
numeric field across all documents, use the $group stage with the $sum
operator.
Example: Suppose
we have a collection named sales with the following documents:
json
CopyEdit
{ "_id":
1, "product": "A", "amount": 100 }
{ "_id":
2, "product": "B", "amount": 200 }
{ "_id":
3, "product": "A", "amount": 150 }
To calculate the total sales
amount, we use the following aggregation:
javascript
CopyEdit
db.sales.aggregate([
{
$group: {
_id: null, // Grouping all documents together
(no specific field)
totalAmount: { $sum: "$amount"
} // Summing the 'amount' field
}
}
])
Output:
json
CopyEdit
{ "_id":
null, "totalAmount": 450 }
2. Average:
To calculate the average of a
numeric field across all documents, use the $avg operator.
Example:
javascript
CopyEdit
db.sales.aggregate([
{
$group: {
_id: null, // Grouping all documents together
averageAmount: { $avg: "$amount"
} // Calculating the average of 'amount'
}
}
])
Output:
json
CopyEdit
{ "_id":
null, "averageAmount": 150 }
3. Group
by Field (Product):
You can group documents by a
field (e.g., product) and then apply aggregation functions to each group.
Example: To
calculate the total sales for each product, you can use the $group stage:
javascript
CopyEdit
db.sales.aggregate([
{
$group: {
_id: "$product", // Grouping by 'product' field
totalAmount: { $sum: "$amount"
} // Summing the 'amount' field for each
group
}
}
])
Output:
json
CopyEdit
[
{ "_id": "A", "totalAmount":
250 },
{ "_id": "B", "totalAmount":
200 }
]
Important
Aggregation Operators:
Here are some common aggregation
operators that can be used in the pipeline stages:
- $sum:
Adds up the values of a field.
- $avg:
Calculates the average value of a field.
- $max:
Finds the maximum value in a field.
- $min:
Finds the minimum value in a field.
- $push:
Creates an array of values from a field.
- $addToSet:
Creates a set (unique values) from a field.
- $count:
Returns the count of documents that pass through the pipeline.
Summary:
In this lesson, we learned the
basics of MongoDB’s aggregation framework and how to use the aggregate() method
to perform basic operations such as summing values, averaging values, and
grouping data. Aggregation is a powerful tool for data analysis and
summarization, and MongoDB provides a flexible and efficient way to process
large datasets.
In the next session, we will dive
deeper into more advanced aggregation techniques and use cases.
Exercises:
1. Sum
Exercise:
o
Create a collection orders with fields like orderId,
product, quantity, and price.
o
Calculate the total revenue by summing the product
of quantity and price for all orders.
2. Average
Exercise:
o
Create a collection students with fields like studentId,
name, and marks.
o
Calculate the average marks of all students.
3. Group
Exercise:
o
Create a collection employees with fields like employeeId,
department, and salary.
o
Group the employees by department and calculate the
total salary for each department.
1. Sum
Exercise:
Problem: Create a
collection orders with fields like orderId, product, quantity, and price. Then
calculate the total revenue by summing the product of quantity and price for
all orders.
Step 1:
Create the orders Collection
We can insert some sample data
into the orders collection.
javascript
CopyEdit
db.orders.insertMany([
{ orderId: 1, product: "Laptop", quantity:
2, price: 50000 },
{ orderId: 2, product: "Phone", quantity:
3, price: 20000 },
{ orderId: 3, product: "Tablet", quantity:
5, price: 15000 }
]);
Step 2:
Calculate the Total Revenue
To calculate the total revenue,
we need to multiply quantity and price for each order and then sum the results
using the aggregation framework.
javascript
CopyEdit
db.orders.aggregate([
{
$project: {
totalRevenue: { $multiply: ["$quantity",
"$price"] } // Multiply
quantity and price
}
},
{
$group: {
_id: null, // Grouping all
documents together
totalRevenue: { $sum: "$totalRevenue"
} // Summing up the total revenue
}
}
])
Expected Output:
json
CopyEdit
{ "_id":
null, "totalRevenue": 260000 }
The total revenue from all orders
is 260000.
2.
Average Exercise:
Problem: Create a
collection students with fields like studentId, name, and marks. Then calculate
the average marks of all students.
Step 1:
Create the students Collection
Insert sample student data:
javascript
CopyEdit
db.students.insertMany([
{ studentId: 1, name: "John", marks:
85 },
{ studentId: 2, name: "Emma", marks:
92 },
{ studentId: 3, name: "Sophia", marks:
78 },
{ studentId: 4, name: "Liam", marks:
88 }
]);
Step 2:
Calculate the Average Marks
To calculate the average marks of
all students, we use the $avg operator in the $group stage.
javascript
CopyEdit
db.students.aggregate([
{
$group: {
_id: null, // Grouping all documents
together
averageMarks: { $avg: "$marks"
} // Calculating the average marks
}
}
])
Expected Output:
json
CopyEdit
{ "_id":
null, "averageMarks": 85.25 }
The average marks of all students
is 85.25.
3. Group
Exercise:
Problem: Create a
collection employees with fields like employeeId, department, and salary. Group
the employees by department and calculate the total salary for each department.
Step 1:
Create the employees Collection
Insert sample employee data:
javascript
CopyEdit
db.employees.insertMany([
{ employeeId: 1, department: "Sales",
salary: 50000 },
{ employeeId: 2, department: "Sales",
salary: 60000 },
{ employeeId: 3, department: "IT", salary:
70000 },
{ employeeId: 4, department: "HR", salary:
40000 },
{ employeeId: 5, department: "IT", salary:
75000 }
]);
Step 2:
Group by Department and Calculate Total Salary
To group the employees by
department and calculate the total salary for each department, use the $group
stage with the $sum operator.
javascript
CopyEdit
db.employees.aggregate([
{
$group: {
_id: "$department", // Grouping by 'department'
totalSalary: { $sum: "$salary"
} // Summing the salaries for each
department
}
}
])
Expected Output:
json
CopyEdit
[
{ "_id": "Sales", "totalSalary":
110000 },
{ "_id": "IT", "totalSalary":
145000 },
{ "_id": "HR", "totalSalary":
40000 }
]
The total salary for each
department:
- Sales:
110000
- IT:
145000
- HR:
40000
Summary:
1. Sum
Exercise: We calculated the total revenue from the orders collection by
multiplying quantity and price and summing the results.
2. Average
Exercise: We calculated the average marks of all students from the students
collection.
3. Group
Exercise: We grouped employees by department and calculated the total salary for
each department in the employees collection.
These exercises demonstrate the
use of MongoDB's aggregation framework to perform essential aggregation
operations like summing, averaging, and grouping data.
