简体   繁体   English

将函数应用于分组熊猫数据框中的列并将输出作为新列返回

[英]Applying function to column in grouped pandas dataframe and returning output as a new column

I have some weather dataset consisting of multiple columns:我有一些由多列组成的天气数据集:

StationID, altitude, datetime, longitude, latitude, rainfall StationID、海拔、日期时间、经度、纬度、降雨量

I have multiple stations, which are identified by their respective IDs.我有多个站点,它们由各自的 ID 标识。 The rainfall column has accumulated rainfall amounts.雨量列有累计雨量。 For example, for station X in 10 days, I could have (in mm/day):例如,对于 10 天后的 X 站,我可以有(以毫米/天为单位):

station X, 0 0 0 1 5 6 6 8 8 15 X 站,0 0 0 1 5 6 6 8 8 15

For station Y, I could have对于 Y 站,我可以

*station Y, 0 1 14 14 14 15 18 18 18 20 *站 Y, 0 1 14 14 14 15 18 18 18 20

But what I need are intensity values, that is, the amount from one day minus the other.但我需要的是强度值,即一天减去另一天的量。 This would give me the following values for stations X and Y (the first value starts with 0),这将为我提供 X 站和 Y 站的以下值(第一个值以 0 开头),

station X, 0 0 0 1 4 1 0 2 0 7 X 站,0 0 0 1 4 1 0 2 0 7

station Y, 0 1 13 0 0 1 3 0 0 2站 Y, 0 1 13 0 0 1 3 0 0 2

I created a function, which takes in a time series and computes this difference:我创建了一个函数,它接收一个时间序列并计算这个差异:

def intensity(ts):
    ts2 = [0]
    for i in range(0,len(ts[:-1])):
        ts2.append((ts[i+1]-ts[i]))
    return ts2

test = [1,2,3,4,5,10,10,10,20,25]
intensity(test)

Now, my question is: how can I apply this function to the 'rainfall' column in my dataframe for each station group, ie:现在,我的问题是:如何将此函数应用于每个站组的数据框中的“降雨”列,即:

dfg = df.groupby('station')

and then assign the output to a new column in the dataframe (eg: 'rain_intensity' column)?然后将输出分配给数据框中的新列(例如:'rain_intensity' 列)?

I think you need:我认为你需要:

print (df.groupby('station')['rainfall'].apply(intensity))

But better is diff with replace NaN to 0 by fillna and then if necessary convert to int :但更好的是difffillnaNaN替换为0 ,然后在必要时转换为int

print (df.groupby('StationID')['rainfall'].diff().fillna(0))

Sample:样本:

df = pd.DataFrame({'rainfall': [0, 0, 0 ,1, 5, 6, 6, 8, 8, 15, 0, 1, 14, 14, 14, 15, 18, 18, 18, 20],
'StationID': ['station X'] * 10 + ['station Y'] * 10})

print (df)
    StationID  rainfall
0   station X         0
1   station X         0
2   station X         0
3   station X         1
4   station X         5
5   station X         6
6   station X         6
7   station X         8
8   station X         8
9   station X        15
10  station Y         0
11  station Y         1
12  station Y        14
13  station Y        14
14  station Y        14
15  station Y        15
16  station Y        18
17  station Y        18
18  station Y        18
19  station Y        20
def intensity(ts):
    ts = ts.tolist()
    ts2 = [0]
    for i in range(0,len(ts[:-1])):
        ts2.append((ts[i+1]-ts[i]))
    return pd.Series(ts2)

df['diff1'] = df.groupby('StationID')['rainfall'].apply(intensity).reset_index(drop=True)
df['diff2'] = df.groupby('StationID')['rainfall'].diff().fillna(0).astype(int)

print (df)
    StationID  rainfall  diff1  diff2
0   station X         0      0      0
1   station X         0      0      0
2   station X         0      0      0
3   station X         1      1      1
4   station X         5      4      4
5   station X         6      1      1
6   station X         6      0      0
7   station X         8      2      2
8   station X         8      0      0
9   station X        15      7      7
10  station Y         0      0      0
11  station Y         1      1      1
12  station Y        14     13     13
13  station Y        14      0      0
14  station Y        14      0      0
15  station Y        15      1      1
16  station Y        18      3      3
17  station Y        18      0      0
18  station Y        18      0      0
19  station Y        20      2      2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM