Lecture Notes Of Day 10: Aggregation Framework – Part 1 (Basics)

Rashmi Mishra
0

Lecture Notes Of Day 10: Aggregation Framework – Part 1 (Basics)

Objective:

The objective of this session is to introduce the aggregation framework in MongoDB, which is a powerful tool for transforming and combining data from multiple documents. By the end of the session, students will be able to use the aggregate() method to perform basic aggregation operations like sum, average, and grouping.

Outcome:

Students will be able to:

  • Understand what aggregation is and why it's useful in MongoDB.
  • Use the aggregate() method to perform basic aggregations such as calculating the sum, average, and grouping data.

What is Aggregation?

Aggregation in MongoDB refers to the process of transforming data from multiple documents into a summarized result. This is similar to SQL’s GROUP BY, SUM(), AVG(), and other aggregate functions. The aggregation framework in MongoDB allows us to perform complex data processing tasks such as filtering, grouping, sorting, and performing calculations.

Unlike simple queries that fetch documents, aggregation operations can manipulate data in ways that help answer complex questions.

Why Use Aggregation?

Aggregation is especially useful for tasks such as:

  • Summarizing data (e.g., total sales, average price).
  • Grouping data based on common fields (e.g., group customers by region).
  • Calculating new fields or values based on the data.

MongoDB provides the aggregate() method, which allows us to perform these tasks.


Basic Structure of the aggregate() Method

The aggregate() method processes data in stages. Each stage performs a specific operation and passes the result to the next stage. These stages are defined using an array of pipeline operators.

Syntax:

javascript

CopyEdit

db.collection.aggregate([

  { stage1 },

  { stage2 },

  { stage3 },

  ...

])

  • db.collection is the MongoDB collection on which the aggregation is being performed.
  • Each stage is represented by an object, which can contain various operators to define what to do with the data.

Stages in the Aggregation Pipeline:

1.   $match – Filters documents based on a specified condition (similar to the find() method).

2.   $group – Groups documents together based on a specified field and applies aggregation operators (like sum, avg, etc.).

3.   $project – Reshapes the documents by adding or removing fields.

4.   $sort – Sorts documents in ascending or descending order.

5.   $limit – Limits the number of documents returned.

6.   $skip – Skips a specified number of documents.

Each stage can work together to perform more complex operations.


Aggregation Examples:

1. Sum:

To calculate the total sum of a numeric field across all documents, use the $group stage with the $sum operator.

Example: Suppose we have a collection named sales with the following documents:

json

CopyEdit

{ "_id": 1, "product": "A", "amount": 100 }

{ "_id": 2, "product": "B", "amount": 200 }

{ "_id": 3, "product": "A", "amount": 150 }

To calculate the total sales amount, we use the following aggregation:

javascript

CopyEdit

db.sales.aggregate([

  {

    $group: {

      _id: null,           // Grouping all documents together (no specific field)

      totalAmount: { $sum: "$amount" } // Summing the 'amount' field

    }

  }

])

Output:

json

CopyEdit

{ "_id": null, "totalAmount": 450 }

2. Average:

To calculate the average of a numeric field across all documents, use the $avg operator.

Example:

javascript

CopyEdit

db.sales.aggregate([

  {

    $group: {

      _id: null,          // Grouping all documents together

      averageAmount: { $avg: "$amount" }  // Calculating the average of 'amount'

    }

  }

])

Output:

json

CopyEdit

{ "_id": null, "averageAmount": 150 }

3. Group by Field (Product):

You can group documents by a field (e.g., product) and then apply aggregation functions to each group.

Example: To calculate the total sales for each product, you can use the $group stage:

javascript

CopyEdit

db.sales.aggregate([

  {

    $group: {

      _id: "$product",          // Grouping by 'product' field

      totalAmount: { $sum: "$amount" }  // Summing the 'amount' field for each group

    }

  }

])

Output:

json

CopyEdit

[

  { "_id": "A", "totalAmount": 250 },

  { "_id": "B", "totalAmount": 200 }

]


Important Aggregation Operators:

Here are some common aggregation operators that can be used in the pipeline stages:

  • $sum: Adds up the values of a field.
  • $avg: Calculates the average value of a field.
  • $max: Finds the maximum value in a field.
  • $min: Finds the minimum value in a field.
  • $push: Creates an array of values from a field.
  • $addToSet: Creates a set (unique values) from a field.
  • $count: Returns the count of documents that pass through the pipeline.

Summary:

In this lesson, we learned the basics of MongoDB’s aggregation framework and how to use the aggregate() method to perform basic operations such as summing values, averaging values, and grouping data. Aggregation is a powerful tool for data analysis and summarization, and MongoDB provides a flexible and efficient way to process large datasets.

In the next session, we will dive deeper into more advanced aggregation techniques and use cases.


Exercises:

1.   Sum Exercise:

o    Create a collection orders with fields like orderId, product, quantity, and price.

o    Calculate the total revenue by summing the product of quantity and price for all orders.

2.   Average Exercise:

o    Create a collection students with fields like studentId, name, and marks.

o    Calculate the average marks of all students.

3.   Group Exercise:

o    Create a collection employees with fields like employeeId, department, and salary.

o    Group the employees by department and calculate the total salary for each department.


1. Sum Exercise:

Problem: Create a collection orders with fields like orderId, product, quantity, and price. Then calculate the total revenue by summing the product of quantity and price for all orders.

Step 1: Create the orders Collection

We can insert some sample data into the orders collection.

javascript

CopyEdit

db.orders.insertMany([

  { orderId: 1, product: "Laptop", quantity: 2, price: 50000 },

  { orderId: 2, product: "Phone", quantity: 3, price: 20000 },

  { orderId: 3, product: "Tablet", quantity: 5, price: 15000 }

]);

Step 2: Calculate the Total Revenue

To calculate the total revenue, we need to multiply quantity and price for each order and then sum the results using the aggregation framework.

javascript

CopyEdit

db.orders.aggregate([

  {

    $project: {

      totalRevenue: { $multiply: ["$quantity", "$price"] }  // Multiply quantity and price

    }

  },

  {

    $group: {

      _id: null,                       // Grouping all documents together

      totalRevenue: { $sum: "$totalRevenue" }  // Summing up the total revenue

    }

  }

])

Expected Output:

json

CopyEdit

{ "_id": null, "totalRevenue": 260000 }

The total revenue from all orders is 260000.


2. Average Exercise:

Problem: Create a collection students with fields like studentId, name, and marks. Then calculate the average marks of all students.

Step 1: Create the students Collection

Insert sample student data:

javascript

CopyEdit

db.students.insertMany([

  { studentId: 1, name: "John", marks: 85 },

  { studentId: 2, name: "Emma", marks: 92 },

  { studentId: 3, name: "Sophia", marks: 78 },

  { studentId: 4, name: "Liam", marks: 88 }

]);

Step 2: Calculate the Average Marks

To calculate the average marks of all students, we use the $avg operator in the $group stage.

javascript

CopyEdit

db.students.aggregate([

  {

    $group: {

      _id: null,                // Grouping all documents together

      averageMarks: { $avg: "$marks" }  // Calculating the average marks

    }

  }

])

Expected Output:

json

CopyEdit

{ "_id": null, "averageMarks": 85.25 }

The average marks of all students is 85.25.


3. Group Exercise:

Problem: Create a collection employees with fields like employeeId, department, and salary. Group the employees by department and calculate the total salary for each department.

Step 1: Create the employees Collection

Insert sample employee data:

javascript

CopyEdit

db.employees.insertMany([

  { employeeId: 1, department: "Sales", salary: 50000 },

  { employeeId: 2, department: "Sales", salary: 60000 },

  { employeeId: 3, department: "IT", salary: 70000 },

  { employeeId: 4, department: "HR", salary: 40000 },

  { employeeId: 5, department: "IT", salary: 75000 }

]);

Step 2: Group by Department and Calculate Total Salary

To group the employees by department and calculate the total salary for each department, use the $group stage with the $sum operator.

javascript

CopyEdit

db.employees.aggregate([

  {

    $group: {

      _id: "$department",           // Grouping by 'department'

      totalSalary: { $sum: "$salary" }  // Summing the salaries for each department

    }

  }

])

Expected Output:

json

CopyEdit

[

  { "_id": "Sales", "totalSalary": 110000 },

  { "_id": "IT", "totalSalary": 145000 },

  { "_id": "HR", "totalSalary": 40000 }

]

The total salary for each department:

  • Sales: 110000
  • IT: 145000
  • HR: 40000

Summary:

1.   Sum Exercise: We calculated the total revenue from the orders collection by multiplying quantity and price and summing the results.

2.   Average Exercise: We calculated the average marks of all students from the students collection.

3.   Group Exercise: We grouped employees by department and calculated the total salary for each department in the employees collection.

These exercises demonstrate the use of MongoDB's aggregation framework to perform essential aggregation operations like summing, averaging, and grouping data.


Tags

Post a Comment

0Comments

Post a Comment (0)

About Me