[英]Groupby and subtract columns in pandas
我有一個包含4列的時間序列數據,我想通過列FisherID
, DateFishing
和Total_Catch
,並對列權Weight
求和。 另外,我希望將Total_catch
列中的值減去列Weight中的值,其結果將保留在名為DIFF
的新列中。 而且,我想在DIFF
列中顯示高於0.1
。
這是我的代碼。
df["DIFF"]=df.groupby(["FisherID", "DateFishing", "Total_Catch"]) ["Weight"].sum()-["Total_Catch"]>=0.1
我的數據:
FisherID DateFishing Total_Catch Weight
1 24-Oct-11 0.9 0.2
1 24-Oct-11 0.9 0.264
1 24-Oct-11 0.9 0.37
2 25-Oct-11 0.7 0.144
2 27-Oct-11 8.2 0.084
2 27-Oct-11 8.2 0.45
3 27-Oct-11 8.2 0.61
3 27-Oct-11 8.2 7
3 29-Oct-11 0.64 0.184
我想你正在尋找一個groupby
+ transform
:
df['Sum'] = df.groupby(
["FisherID", "DateFishing", "Total_Catch"]
)["Weight"].transform('sum')
然后,通過從Total_Catch
減去Weight
col來找到Diff
。
df['Diff'] = (df['Total_Catch'] - df['Weight'])
df
FisherID DateFishing Total_Catch Weight Sum Diff
0 1 24-Oct-11 0.90 0.200 0.834 0.700
1 1 24-Oct-11 0.90 0.264 0.834 0.636
2 1 24-Oct-11 0.90 0.370 0.834 0.530
3 2 25-Oct-11 0.70 0.144 0.144 0.556
4 2 27-Oct-11 8.20 0.084 0.534 8.116
5 2 27-Oct-11 8.20 0.450 0.534 7.750
6 3 27-Oct-11 8.20 0.610 7.610 7.590
7 3 27-Oct-11 8.20 7.000 7.610 1.200
8 3 29-Oct-11 0.64 0.184 0.184 0.456
或者,如果您嘗試從Total_Catch
減去分組的權Weight
,請使用:
df['Diff'] = df["Total_Catch"] -df.groupby(["FisherID", \
"DateFishing", "Total_Catch"])["Weight"].transform('sum')
df
FisherID DateFishing Total_Catch Weight Diff
0 1 24-Oct-11 0.90 0.200 0.066
1 1 24-Oct-11 0.90 0.264 0.066
2 1 24-Oct-11 0.90 0.370 0.066
3 2 25-Oct-11 0.70 0.144 0.556
4 2 27-Oct-11 8.20 0.084 7.666
5 2 27-Oct-11 8.20 0.450 7.666
6 3 27-Oct-11 8.20 0.610 0.590
7 3 27-Oct-11 8.20 7.000 0.590
8 3 29-Oct-11 0.64 0.184 0.456
本節以第二個選項的結果為基礎。 請注意,所有這些選項都將布爾掩碼應用於數據幀。 如果你想要的只是掩碼,不要將它應用於數據幀。 只需應用條件並打印:
df.Diff > 0.1
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
Name: Diff, dtype: bool
如果要提取所有有效行,可以使用幾個選項。
df.query
df.query('Diff > 0.1')
FisherID DateFishing Total_Catch Weight Diff
3 2 25-Oct-11 0.70 0.144 0.556
4 2 27-Oct-11 8.20 0.084 7.666
5 2 27-Oct-11 8.20 0.450 7.666
6 3 27-Oct-11 8.20 0.610 0.590
7 3 27-Oct-11 8.20 7.000 0.590
8 3 29-Oct-11 0.64 0.184 0.456
boolean indexing
df[df.Diff > 0.1]
FisherID DateFishing Total_Catch Weight Diff
3 2 25-Oct-11 0.70 0.144 0.556
4 2 27-Oct-11 8.20 0.084 7.666
5 2 27-Oct-11 8.20 0.450 7.666
6 3 27-Oct-11 8.20 0.610 0.590
7 3 27-Oct-11 8.20 7.000 0.590
8 3 29-Oct-11 0.64 0.184 0.456
df.eval
df[df.eval('Diff > 0.1')]
FisherID DateFishing Total_Catch Weight Diff
3 2 25-Oct-11 0.70 0.144 0.556
4 2 27-Oct-11 8.20 0.084 7.666
5 2 27-Oct-11 8.20 0.450 7.666
6 3 27-Oct-11 8.20 0.610 0.590
7 3 27-Oct-11 8.20 7.000 0.590
8 3 29-Oct-11 0.64 0.184 0.456
df.where
和dropna
df.where(df.Diff > 0.1).dropna(how='all')
FisherID DateFishing Total_Catch Weight Diff
3 2.0 25-Oct-11 0.70 0.144 0.556
4 2.0 27-Oct-11 8.20 0.084 7.666
5 2.0 27-Oct-11 8.20 0.450 7.666
6 3.0 27-Oct-11 8.20 0.610 0.590
7 3.0 27-Oct-11 8.20 7.000 0.590
8 3.0 29-Oct-11 0.64 0.184 0.456
np.where
和df.iloc
: df.iloc[np.where(df.Diff > 0.1)[0]]
FisherID DateFishing Total_Catch Weight Diff
3 2 25-Oct-11 0.70 0.144 0.556
4 2 27-Oct-11 8.20 0.084 7.666
5 2 27-Oct-11 8.20 0.450 7.666
6 3 27-Oct-11 8.20 0.610 0.590
7 3 27-Oct-11 8.20 7.000 0.590
8 3 29-Oct-11 0.64 0.184 0.456
請注意,這些結果具有原始df
的索引。 如果要重置索引,請使用reset_index
:
df[df.Diff > 0.1].reset_index(drop=True)
FisherID DateFishing Total_Catch Weight Diff
0 2 25-Oct-11 0.70 0.144 0.556
1 2 27-Oct-11 8.20 0.084 7.666
2 2 27-Oct-11 8.20 0.450 7.666
3 3 27-Oct-11 8.20 0.610 0.590
4 3 27-Oct-11 8.20 7.000 0.590
5 3 29-Oct-11 0.64 0.184 0.456
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.