GroupBy Function in Pandas: Complete Guide with Examples

GroupBy Function in Pandas: Complete Guide with Examples

GroupBy Function in Pandas

Pandas is one of the most powerful Python libraries used for data analysis and manipulation. Among its many features, the GroupBy function is one of the most important and frequently used tools for summarizing, aggregating, and analyzing data.

Data Analysts and Data Scientists use GroupBy extensively to perform operations such as calculating totals, averages, counts, and other statistics for different categories of data.

In this guide, you'll learn everything about the Pandas GroupBy function, including syntax, examples, aggregation methods, and real-world applications.


What is GroupBy in Pandas?

The GroupBy function allows you to split data into groups based on one or more columns and then perform operations on each group.

The process generally follows:

Split

Divide data into groups.

Apply

Perform calculations or transformations.

Combine

Merge the results into a new dataset.

This concept is commonly known as:

Split → Apply → Combine

Why Use GroupBy?

GroupBy helps analysts:

It is one of the most important functions in Data Analytics.


Importing Pandas

Before using GroupBy, import Pandas.

import pandas as pd

Sample Dataset

Let's create a sample sales dataset.

import pandas as pd

data = {
    "Department": ["Sales", "Sales", "IT", "IT", "HR"],
    "Employee": ["A", "B", "C", "D", "E"],
    "Salary": [50000, 60000, 70000, 80000, 45000]
}

df = pd.DataFrame(data)

print(df)

Output:

  Department Employee Salary
0 Sales      A       50000
1 Sales      B       60000
2 IT         C       70000
3 IT         D       80000
4 HR         E       45000

Basic GroupBy Syntax

df.groupby("column_name")

Example:

df.groupby("Department")

This creates groups but does not display results until an operation is applied.


GroupBy with Sum

Calculate total salary by department.

df.groupby("Department")["Salary"].sum()

Output:

Department
HR        45000
IT       150000
Sales    110000

GroupBy with Mean

Calculate average salary by department.

df.groupby("Department")["Salary"].mean()

Output:

Department
HR       45000
IT       75000
Sales    55000

GroupBy with Count

Count employees in each department.

df.groupby("Department")["Employee"].count()

Output:

Department
HR       1
IT       2
Sales    2

Common Aggregation Functions

Pandas supports several aggregation methods.

FunctionPurpose
sum()Total
mean()Average
count()Number of Records
max()Maximum Value
min()Minimum Value
median()Median Value
std()Standard Deviation
var()Variance

GroupBy with Multiple Aggregations

You can apply multiple calculations simultaneously.

df.groupby("Department")["Salary"].agg(
    ["sum", "mean", "max", "min"]
)

Output:

             sum    mean    max    min
Department
HR         45000  45000  45000  45000
IT        150000  75000  80000  70000
Sales     110000  55000  60000  50000

GroupBy Multiple Columns

You can group data using multiple columns.

Example:

df.groupby(
    ["Department", "Employee"]
)["Salary"].sum()

This creates nested grouping levels.


Reset Index After GroupBy

GroupBy results often use grouped columns as indexes.

Convert them back into regular columns.

df.groupby("Department")["Salary"].sum().reset_index()

Output:

Department Salary
HR         45000
IT         150000
Sales      110000

GroupBy with Max

Find highest salary in each department.

df.groupby("Department")["Salary"].max()

Output:

Department
HR       45000
IT       80000
Sales    60000

GroupBy with Min

Find lowest salary in each department.

df.groupby("Department")["Salary"].min()

Output:

Department
HR       45000
IT       70000
Sales    50000

GroupBy with Median

Calculate median salary.

df.groupby("Department")["Salary"].median()

GroupBy with Custom Aggregation

Apply custom functions.

Example:

df.groupby("Department")["Salary"].agg(
    lambda x: x.max() - x.min()
)

Output:

Department
HR          0
IT      10000
Sales   10000

This calculates salary range per department.


Filtering Groups

Filter groups based on conditions.

Example:

df.groupby("Department").filter(
    lambda x: x["Salary"].mean() > 50000
)

Output:

Only IT and Sales departments remain.


Transforming Data Using GroupBy

Transform applies operations while preserving original rows.

Example:

df["Dept_Avg"] = df.groupby(
    "Department"
)["Salary"].transform("mean")

Output:

Department Employee Salary Dept_Avg
Sales      A       50000  55000
Sales      B       60000  55000
IT         C       70000  75000
IT         D       80000  75000
HR         E       45000  45000

Real-World Example: Sales Analysis

Dataset:

sales = {
    "Region": ["North", "South", "North", "West"],
    "Sales": [1000, 1500, 1200, 1800]
}

df = pd.DataFrame(sales)

Calculate sales by region.

df.groupby("Region")["Sales"].sum()

Output:

North    2200
South    1500
West     1800

This is commonly used in business reporting.


GroupBy in Data Analytics

Data Analysts frequently use GroupBy for:

Sales Reports

Revenue by region, city, or product.


Customer Analytics

Customer count by segment.


HR Analytics

Employee salary analysis.


Marketing Analytics

Campaign performance summaries.


Financial Analysis

Profit and expense reporting.


Difference Between GroupBy and Pivot Table

GroupByPivot Table
More FlexibleEasier Reporting
Code-OrientedBusiness-Friendly
Better for Complex OperationsBetter for Summaries

Both are widely used in Data Analytics.


Common GroupBy Interview Questions

What does GroupBy do?

It splits data into groups and performs operations on each group.


What is the Split-Apply-Combine strategy?

The process used internally by GroupBy:

Split → Apply → Combine


Can GroupBy use multiple columns?

Yes.

Example:

df.groupby(
["Department", "Employee"]
)

What is agg()?

agg() allows multiple aggregation functions to be applied simultaneously.


What is transform()?

transform() performs group-level calculations while preserving the original DataFrame shape.


Advantages of GroupBy

Benefits include:

GroupBy is one of the most powerful features of Pandas.


Best Practices for Using GroupBy

Use Meaningful Aggregations

Choose functions relevant to business goals.


Reset Index When Needed

Makes results easier to read.


Use agg() for Multiple Metrics

Reduces repetitive code.


Combine with Visualization

Use GroupBy results in Power BI, Tableau, or Matplotlib charts.


Career Relevance of GroupBy

GroupBy is widely used by:

Most Data Analytics interviews include questions about GroupBy.


Final Thoughts

The GroupBy function is one of the most important tools in Pandas for data analysis and reporting. Whether you're calculating sales totals, customer counts, employee salaries, or marketing performance metrics, GroupBy makes data aggregation simple and efficient.

Mastering GroupBy is essential for anyone pursuing a career in Data Analytics, Data Science, Business Intelligence, or Machine Learning.

Suggested Internal Links

Focus Keyword

GroupBy Function in Pandas

Secondary Keywords