how this grouper function works and in the df.groupby() how lamda function works.
Data: https://github.com/codebasics/py/blob/master/pandas/7_group_by/weather_by_cities.csv
day city temperature windspeed event
0 1/1/2017 new york 32 6 Rain
1 1/2/2017 new york 36 7 Sunny
2 1/3/2017 new york 28 12 Snow
3 1/4/2017 new york 33 7 Sunny
4 1/1/2017 mumbai 90 5 Sunny
5 1/2/2017 mumbai 85 12 Fog
6 1/3/2017 mumbai 87 15 Fog
7 1/4/2017 mumbai 92 5 Rain
8 1/1/2017 paris 45 20 Sunny
9 1/2/2017 paris 50 13 Cloudy
10 1/3/2017 paris 54 8 Cloudy
11 1/4/2017 paris 42 10 Cloudy
import pandas as pd
df = pd.read_csv("weather_by_cities.csv")
def grouper(df, idx, col):
if 80 <= df[col].loc[idx] <= 90:
return '80-90'
elif 50 <= df[col].loc[idx] <= 60:
return '50-60'
else:
return 'others'
g = df.groupby(lambda x: grouper(df, x, 'temperature'))
g
for key, d in g:
print("Group by Key: {}\n".format(key))
print(d)
Basically, the code groups the dataframe based on temperature ranges.
The lambda
function applies the grouper
function on each row of the dataset.
The grouper
function, in turn, groups the rows into smaller dataframes based on temperature conditions and also returns the key for each group.
The for-loop iterates over the groups and prints the key and the subset.
The grouper function takes in a dataframe, an index (row number), and a column. It then returns a string based on the values of that specific column and row index.
A lambda function is an inline function format in python. In this case, instead of using the pandas groupby function on the categories of a column like most people are used to, the code is grouping by the result of the lambda function.
Here is the result where the for loop iterates over the groupby object g
:
Group by Key: 50-60
day city temperature windspeed event
9 1/2/2017 paris 50 13 Cloudy
10 1/3/2017 paris 54 8 Cloudy
Group by Key: 80-90
day city temperature windspeed event
4 1/1/2017 mumbai 90 5 Sunny
5 1/2/2017 mumbai 85 12 Fog
6 1/3/2017 mumbai 87 15 Fog
Group by Key: others
day city temperature windspeed event
0 1/1/2017 new york 32 6 Rain
1 1/2/2017 new york 36 7 Sunny
2 1/3/2017 new york 28 12 Snow
3 1/4/2017 new york 33 7 Sunny
7 1/4/2017 mumbai 92 5 Rain
8 1/1/2017 paris 45 20 Sunny
11 1/4/2017 paris 42 10 Cloudy
It turns out that there are 3 possible outcomes of the function: '50-60', '80-90', and 'others'. The resulting groupby object has the 3 different keys and the associated dataframes in the result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.