# GroupBy Function in Pandas Python

0
489

## Introduction To GroupBy Function in pandas

GroupBy Function in pandas and aggregation are some of the most frequently used operations in data analysis, especially while doing exploratory data analysis (EDA), where comparing summary statistics across groups of data is common.

For e.g., Suppose you have cities data and you want to analysye that the overall population of city and state aof average population of cities and state and according to each city population according to state that time we used this group-by and aggregation function to calculate value accordin to common values in the state and cities.

### Grouping analysis can be thought of as having three parts:

1. Splitting the data into groups (e.g. groups of customer segments, product categories, etc.)

2. Applying a function to each group (e.g. mean or total sales of each customer segment)

3. Combining the results into a data structure showing the summary statistics

## Applying GroupBy Function to groups in pandas

• Aggregation
• Transformation
• Filtration
• Applying our own function

## Methods of GroupBy Function in pandas

Given data frame for apply gropuby and aggregation method

Code:-

``````import numpy as np

population = DataFrame({'State':['Maharashtra','Maharashtra','Maharashtra',
'Cities':['Nagpur','Nagpur','Mumbai',
'Lucknow','Kanpur',
'Bhopal','Indore','Indore',
'Chennai','Chennai'],
'Female Population': np.random.randint(100000,500000,10),
'Male Population': np.random.randint(100000,500000,10),
'Total Population':np.random.randint(200000,700000,10),
'literacy_rate_total':np.abs(np.random.randn(10)*40)})

# np.random.randint() is used for generate random numbers in data
# np.ramdom.randn() is used for generate random normal number in data

population # To Show output of data frame

Output:-

``````
`Widget not in any sidebars`

### Groupby on the basis of single categorical column

Example:- Use the Above population data and create a group of states.

``````# It just Showing the output as a group is created according to State
population.groupby('State')``````

Output:-

```<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000211A6521CF8>
# To show the How Group value and their index we used .groups function
population.groupby('State').groups```

Output:-

``````{'Madhya Pradesh': Int64Index([5, 6, 7], dtype='int64'),
'Maharashtra': Int64Index([0, 1, 2], dtype='int64'),
# Apply Aggrigartion function To calculate Total Population of each state
# Their is number of aggrigationfunction like( sum, mean, count, max, min,etc )
population.groupby('State').sum()``````

Output:-

Female Population Male Population Total Population literacy_rate_total

State

Maharashtra 783701 1154190 1551671 105.044499

Tamil Nadu 776934 355226 998136 45.691927

Uttar Pradesh 666343 865744 876842 137.275400

Here it shows the output as the sum of all the column values in according to states

### Groupby on the basis of Two categorical column

Example:

`# It just Showing the output as a group is created according to Statepopulation.groupby(['State','Cities'])`

Output:

`<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000211A6521CF8>`
``````# To show the Group value and their index we used .groups function
population.groupby('State').groups

Output:-
'Maharashtra': Int64Index([0, 1, 2], dtype='int64'),

# Apply Aggregation function To calculate the mean of each state according to their cities

mean_pop = population.groupby(['State','Cities']).mean()
mean_pop

# It shows the output of each cities average female population, male population, total population and the literacy_rate_total
Output:-

Female Population	Male Population	Total Population	literacy_rate_total
State	Cities
Indore	293987.0	296969.0	466876.5	19.592425
Maharashtra	Mumbai	186423.0	325910.0	680144.0	36.866375
Nagpur	298639.0	414140.0	435763.5	34.089062
Tamil Nadu	Chennai	388467.0	177613.0	499068.0	22.845964
Uttar Pradesh	Kanpur	290262.0	425102.0	214901.0	35.262222
Lucknow	376081.0	440642.0	661941.0	102.013178
``````
`Widget not in any sidebars`

### Loop over GroupBy groups

In this part iterating an element of group containing and shows their values as output.

Example:-

# iterating an element of group containing and shows their values

# create Group according to State

``````
grp = population.groupby(['State'])
for name, group in grp:
print(name)
print(group)
print()
Output:-
State  Cities  Female Population  Male Population  \

Total Population  literacy_rate_total
5            321898             6.824701
6            628995            26.194098
7            304758            12.990752

Maharashtra
State  Cities  Female Population  Male Population  Total Population  \
0  Maharashtra  Nagpur             334934           357959            508852
1  Maharashtra  Nagpur             262344           470321            362675
2  Maharashtra  Mumbai             186423           325910            680144

literacy_rate_total
0            33.795318
1            34.382807
2            36.866375

State   Cities  Female Population  Male Population  Total Population  \
8  Tamil Nadu  Chennai             394035           109944            515960
9  Tamil Nadu  Chennai             382899           245282            482176

literacy_rate_total
8            17.705373
9            27.986555

State   Cities  Female Population  Male Population  \
3  Uttar Pradesh  Lucknow             376081           440642
4  Uttar Pradesh   Kanpur             290262           425102

Total Population  literacy_rate_total
3            661941           102.013178
4            214901            35.262222
``````

Example:-

# iterating an element of group containing and shows their values

# create Group according to State and its Cities

``````grp = population.groupby(['State','Cities'])
for name, group in grp:
print(name)
print(group)
print()
Output:-
State  Cities  Female Population  Male Population  \

Total Population  literacy_rate_total
5            321898             6.824701

State  Cities  Female Population  Male Population  \

Total Population  literacy_rate_total
6            628995            26.194098
7            304758            12.990752

('Maharashtra', 'Mumbai')
State  Cities  Female Population  Male Population  Total Population  \
2  Maharashtra  Mumbai             186423           325910            680144

literacy_rate_total
2            36.866375

('Maharashtra', 'Nagpur')
State  Cities  Female Population  Male Population  Total Population  \
0  Maharashtra  Nagpur             334934           357959            508852
1  Maharashtra  Nagpur             262344           470321            362675

literacy_rate_total
0            33.795318
1            34.382807

State   Cities  Female Population  Male Population  Total Population  \
8  Tamil Nadu  Chennai             394035           109944            515960
9  Tamil Nadu  Chennai             382899           245282            482176

literacy_rate_total
8            17.705373
9            27.986555

State  Cities  Female Population  Male Population  \
4  Uttar Pradesh  Kanpur             290262           425102

Total Population  literacy_rate_total
4            214901            35.262222

State   Cities  Female Population  Male Population  \
3  Uttar Pradesh  Lucknow             376081           440642

Total Population  literacy_rate_total
3            661941           102.013178

``````

Selecting groups

If you want to select particular group from groupby the used groypby.get_group Function.

Example:- Select particular group Maharashtra

Code:-

``````# selecting a single group

grp = population.groupby('State')
grp.get_group('Maharashtra') ``````
```Output:-
State Cities Female Population Male Population Total Population literacy_rate_total
0 Maharashtra Nagpur 334934 357959 508852 33.795318
1 Maharashtra Nagpur 262344 470321 362675 34.382807
2 Maharashtra Mumbai 186423 325910 680144 36.866375```

Example :-

# selecting a single group

Output:-.

`grp = population.groupby(['State','Cities']) grp.get_group(('Uttar Pradesh', 'Lucknow'))State Cities Female Population Male Population Total Population literacy_rate_total3 Uttar Pradesh Lucknow 376081 440642 661941 102.013178`

Apply Functions into Group

• Aggregation: It is used to calculate summary statistics of each group category example calculator sum average minimum value
• Transformation: Used to perform some group-specific computation and return a like indexed. EX Fill null value in the group according to the calculated value of group
• Filtration: apply filter function according to the group-wise computation that evaluates as Boolean.Example. Filter out the data according to there group of sum and mean.

## Aggregation

Example:- Calculate mean, sum and minimum value of Female population of each state

Code:-

`````` grp = population.groupby('State')

grp['Female Population'].agg([np.sum, np.mean, np.min])  # Pass Select perticulat columns to Calculate there values

Output:-

sum	mean	amin
State
Maharashtra	783701	261233.666667	186423

``````

Example:- Apply different aggregation function to different columns of data frame\

Code:-

``````# applying a function bypassing
# a list of functions
grp = population.groupby('State')
grp.agg({'Female Population':np.sum,'Male Population': np.sum, 'literacy_rate_total':np.min})
# Pass Select particular columns to Calculate different Aggregation values``````

Output:-

`Female Population Male Population literacy_rate_total`

State

`Madhya Pradesh 778700 884884 6.824701Maharashtra 783701 1154190 33.795318Tamil Nadu 776934 355226 17.705373Uttar Pradesh 666343 865744 35.262222`

## Transformation

Transform method Output an object that is indexed the same (same size) as the one each group.

Example:- Perform some group specific computation

``````Filtration:-

Example:- Filter out the cities which get occurs in two or more time
grp = population.groupby('Cities')
grp.filter(lambda x: len(x) >= 2)
Output:-

State	Cities	Female Population	Male Population	Total Population	literacy_rate_total
0	Maharashtra	Nagpur	334934	357959	508852	33.795318
1	Maharashtra	Nagpur	262344	470321	362675	34.382807