
Pandas is one of the most powerful Python libraries used for data analysis and manipulation. Among its many features, the GroupBy function is one of the most important and frequently used tools for summarizing, aggregating, and analyzing data.
Data Analysts and Data Scientists use GroupBy extensively to perform operations such as calculating totals, averages, counts, and other statistics for different categories of data.
In this guide, you'll learn everything about the Pandas GroupBy function, including syntax, examples, aggregation methods, and real-world applications.
The GroupBy function allows you to split data into groups based on one or more columns and then perform operations on each group.
The process generally follows:
Divide data into groups.
Perform calculations or transformations.
Merge the results into a new dataset.
This concept is commonly known as:
Split → Apply → Combine
GroupBy helps analysts:
Summarize large datasets
Generate business reports
Calculate KPIs
Analyze trends
Perform aggregations efficiently
Create dashboard-ready data
It is one of the most important functions in Data Analytics.
Before using GroupBy, import Pandas.
import pandas as pd
Let's create a sample sales dataset.
import pandas as pd
data = {
"Department": ["Sales", "Sales", "IT", "IT", "HR"],
"Employee": ["A", "B", "C", "D", "E"],
"Salary": [50000, 60000, 70000, 80000, 45000]
}
df = pd.DataFrame(data)
print(df)
Output:
Department Employee Salary
0 Sales A 50000
1 Sales B 60000
2 IT C 70000
3 IT D 80000
4 HR E 45000
df.groupby("column_name")
Example:
df.groupby("Department")
This creates groups but does not display results until an operation is applied.
Calculate total salary by department.
df.groupby("Department")["Salary"].sum()
Output:
Department
HR 45000
IT 150000
Sales 110000
Calculate average salary by department.
df.groupby("Department")["Salary"].mean()
Output:
Department
HR 45000
IT 75000
Sales 55000
Count employees in each department.
df.groupby("Department")["Employee"].count()
Output:
Department
HR 1
IT 2
Sales 2
Pandas supports several aggregation methods.
| Function | Purpose |
|---|---|
| sum() | Total |
| mean() | Average |
| count() | Number of Records |
| max() | Maximum Value |
| min() | Minimum Value |
| median() | Median Value |
| std() | Standard Deviation |
| var() | Variance |
You can apply multiple calculations simultaneously.
df.groupby("Department")["Salary"].agg(
["sum", "mean", "max", "min"]
)
Output:
sum mean max min
Department
HR 45000 45000 45000 45000
IT 150000 75000 80000 70000
Sales 110000 55000 60000 50000
You can group data using multiple columns.
Example:
df.groupby(
["Department", "Employee"]
)["Salary"].sum()
This creates nested grouping levels.
GroupBy results often use grouped columns as indexes.
Convert them back into regular columns.
df.groupby("Department")["Salary"].sum().reset_index()
Output:
Department Salary
HR 45000
IT 150000
Sales 110000
Find highest salary in each department.
df.groupby("Department")["Salary"].max()
Output:
Department
HR 45000
IT 80000
Sales 60000
Find lowest salary in each department.
df.groupby("Department")["Salary"].min()
Output:
Department
HR 45000
IT 70000
Sales 50000
Calculate median salary.
df.groupby("Department")["Salary"].median()
Apply custom functions.
Example:
df.groupby("Department")["Salary"].agg(
lambda x: x.max() - x.min()
)
Output:
Department
HR 0
IT 10000
Sales 10000
This calculates salary range per department.
Filter groups based on conditions.
Example:
df.groupby("Department").filter(
lambda x: x["Salary"].mean() > 50000
)
Output:
Only IT and Sales departments remain.
Transform applies operations while preserving original rows.
Example:
df["Dept_Avg"] = df.groupby(
"Department"
)["Salary"].transform("mean")
Output:
Department Employee Salary Dept_Avg
Sales A 50000 55000
Sales B 60000 55000
IT C 70000 75000
IT D 80000 75000
HR E 45000 45000
Dataset:
sales = {
"Region": ["North", "South", "North", "West"],
"Sales": [1000, 1500, 1200, 1800]
}
df = pd.DataFrame(sales)
Calculate sales by region.
df.groupby("Region")["Sales"].sum()
Output:
North 2200
South 1500
West 1800
This is commonly used in business reporting.
Data Analysts frequently use GroupBy for:
Revenue by region, city, or product.
Customer count by segment.
Employee salary analysis.
Campaign performance summaries.
Profit and expense reporting.
| GroupBy | Pivot Table |
|---|---|
| More Flexible | Easier Reporting |
| Code-Oriented | Business-Friendly |
| Better for Complex Operations | Better for Summaries |
Both are widely used in Data Analytics.
It splits data into groups and performs operations on each group.
The process used internally by GroupBy:
Split → Apply → Combine
Yes.
Example:
df.groupby(
["Department", "Employee"]
)
agg() allows multiple aggregation functions to be applied simultaneously.
transform() performs group-level calculations while preserving the original DataFrame shape.
Benefits include:
Fast Data Summarization
Easy Reporting
Flexible Aggregations
Efficient Data Analysis
Better Business Insights
GroupBy is one of the most powerful features of Pandas.
Choose functions relevant to business goals.
Makes results easier to read.
Reduces repetitive code.
Use GroupBy results in Power BI, Tableau, or Matplotlib charts.
GroupBy is widely used by:
Data Analysts
Data Scientists
Business Analysts
Machine Learning Engineers
Financial Analysts
Most Data Analytics interviews include questions about GroupBy.
The GroupBy function is one of the most important tools in Pandas for data analysis and reporting. Whether you're calculating sales totals, customer counts, employee salaries, or marketing performance metrics, GroupBy makes data aggregation simple and efficient.
Mastering GroupBy is essential for anyone pursuing a career in Data Analytics, Data Science, Business Intelligence, or Machine Learning.
Introduction to Pandas
NumPy vs Pandas
Data Cleaning in Python
Data Analysis Using Python
Data Science Career Roadmap
Python for Data Analytics
GroupBy Function in Pandas
Pandas GroupBy Tutorial
GroupBy in Python
Pandas Aggregation Functions
Data Analysis Using Pandas
Pandas Interview Questions
Python Data Analytics