简体   繁体   中英

Can you help me to understand this pandas code

how this grouper function works and in the df.groupby() how lamda function works.
Data: https://github.com/codebasics/py/blob/master/pandas/7_group_by/weather_by_cities.csv

         day      city  temperature  windspeed   event
0   1/1/2017  new york           32          6    Rain
1   1/2/2017  new york           36          7   Sunny
2   1/3/2017  new york           28         12    Snow
3   1/4/2017  new york           33          7   Sunny
4   1/1/2017    mumbai           90          5   Sunny
5   1/2/2017    mumbai           85         12     Fog
6   1/3/2017    mumbai           87         15     Fog
7   1/4/2017    mumbai           92          5    Rain
8   1/1/2017     paris           45         20   Sunny
9   1/2/2017     paris           50         13  Cloudy
10  1/3/2017     paris           54          8  Cloudy
11  1/4/2017     paris           42         10  Cloudy


import pandas as pd
df = pd.read_csv("weather_by_cities.csv")

def grouper(df, idx, col):
    if 80 <= df[col].loc[idx] <= 90:
        return '80-90'
    elif 50 <= df[col].loc[idx] <= 60:
        return '50-60'
    else:
        return 'others'

g = df.groupby(lambda x: grouper(df, x, 'temperature'))
g


for key, d in g:
    print("Group by Key: {}\n".format(key))
    print(d)

Basically, the code groups the dataframe based on temperature ranges.

  1. The lambda function applies the grouper function on each row of the dataset.

  2. The grouper function, in turn, groups the rows into smaller dataframes based on temperature conditions and also returns the key for each group.

  3. The for-loop iterates over the groups and prints the key and the subset.

The grouper function takes in a dataframe, an index (row number), and a column. It then returns a string based on the values of that specific column and row index.

A lambda function is an inline function format in python. In this case, instead of using the pandas groupby function on the categories of a column like most people are used to, the code is grouping by the result of the lambda function.

Here is the result where the for loop iterates over the groupby object g :

Group by Key: 50-60

         day   city  temperature  windspeed   event
9   1/2/2017  paris           50         13  Cloudy
10  1/3/2017  paris           54          8  Cloudy
Group by Key: 80-90

        day    city  temperature  windspeed   event
4  1/1/2017  mumbai           90          5   Sunny
5  1/2/2017  mumbai           85         12     Fog
6  1/3/2017  mumbai           87         15     Fog
Group by Key: others

         day      city  temperature  windspeed   event
0   1/1/2017  new york           32          6    Rain
1   1/2/2017  new york           36          7   Sunny
2   1/3/2017  new york           28         12    Snow
3   1/4/2017  new york           33          7   Sunny
7   1/4/2017    mumbai           92          5    Rain
8   1/1/2017     paris           45         20   Sunny
11  1/4/2017     paris           42         10  Cloudy

It turns out that there are 3 possible outcomes of the function: '50-60', '80-90', and 'others'. The resulting groupby object has the 3 different keys and the associated dataframes in the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM