繁体   English   中英

在pandas中分组和减去列

[英]Groupby and subtract columns in pandas

我有一个包含4列的时间序列数据,我想通过列FisherIDDateFishingTotal_Catch ,并对列权Weight求和。 另外,我希望将Total_catch列中的值减去列Weight中的值,其结果将保留在名为DIFF的新列中。 而且,我想在DIFF列中显示高于0.1

这是我的代码。

df["DIFF"]=df.groupby(["FisherID", "DateFishing", "Total_Catch"]) ["Weight"].sum()-["Total_Catch"]>=0.1

我的数据:

FisherID    DateFishing Total_Catch Weight
1            24-Oct-11      0.9      0.2
1            24-Oct-11      0.9      0.264
1            24-Oct-11      0.9      0.37
2            25-Oct-11      0.7      0.144
2            27-Oct-11      8.2      0.084
2            27-Oct-11      8.2      0.45
3            27-Oct-11      8.2      0.61
3            27-Oct-11      8.2      7
3            29-Oct-11      0.64    0.184

我想你正在寻找一个groupby + transform

df['Sum'] = df.groupby(
    ["FisherID", "DateFishing", "Total_Catch"]
)["Weight"].transform('sum')

然后,通过从Total_Catch减去Weight col来找到Diff

df['Diff'] = (df['Total_Catch'] - df['Weight'])

df

   FisherID DateFishing  Total_Catch  Weight    Sum   Diff
0         1   24-Oct-11         0.90   0.200  0.834  0.700
1         1   24-Oct-11         0.90   0.264  0.834  0.636
2         1   24-Oct-11         0.90   0.370  0.834  0.530
3         2   25-Oct-11         0.70   0.144  0.144  0.556
4         2   27-Oct-11         8.20   0.084  0.534  8.116
5         2   27-Oct-11         8.20   0.450  0.534  7.750
6         3   27-Oct-11         8.20   0.610  7.610  7.590
7         3   27-Oct-11         8.20   7.000  7.610  1.200
8         3   29-Oct-11         0.64   0.184  0.184  0.456

或者,如果您尝试从Total_Catch减去分组的权Weight ,请使用:

df['Diff'] = df["Total_Catch"] -df.groupby(["FisherID", \
                   "DateFishing", "Total_Catch"])["Weight"].transform('sum')

df

   FisherID DateFishing  Total_Catch  Weight   Diff
0         1   24-Oct-11         0.90   0.200  0.066
1         1   24-Oct-11         0.90   0.264  0.066
2         1   24-Oct-11         0.90   0.370  0.066
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

查询行

本节以第二个选项的结果为基础。 请注意,所有这些选项都将布尔掩码应用于数据帧。 如果你想要的只是掩码,不要将它应用于数据帧。 只需应用条件并打印:

df.Diff > 0.1

0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
8     True
Name: Diff, dtype: bool

如果要提取所有有效行,可以使用几个选项。

df.query

df.query('Diff > 0.1')

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

boolean indexing

df[df.Diff > 0.1]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.eval

df[df.eval('Diff > 0.1')]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.wheredropna

df.where(df.Diff > 0.1).dropna(how='all')

   FisherID DateFishing  Total_Catch  Weight   Diff
3       2.0   25-Oct-11         0.70   0.144  0.556
4       2.0   27-Oct-11         8.20   0.084  7.666
5       2.0   27-Oct-11         8.20   0.450  7.666
6       3.0   27-Oct-11         8.20   0.610  0.590
7       3.0   27-Oct-11         8.20   7.000  0.590
8       3.0   29-Oct-11         0.64   0.184  0.456

np.wheredf.iloc

df.iloc[np.where(df.Diff > 0.1)[0]]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

请注意,这些结果具有原始df的索引。 如果要重置索引,请使用reset_index

df[df.Diff > 0.1].reset_index(drop=True)

   FisherID DateFishing  Total_Catch  Weight   Diff
0         2   25-Oct-11         0.70   0.144  0.556
1         2   27-Oct-11         8.20   0.084  7.666
2         2   27-Oct-11         8.20   0.450  7.666
3         3   27-Oct-11         8.20   0.610  0.590
4         3   27-Oct-11         8.20   7.000  0.590
5         3   29-Oct-11         0.64   0.184  0.456

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM