簡體   English   中英

在pandas中分組和減去列

[英]Groupby and subtract columns in pandas

我有一個包含4列的時間序列數據,我想通過列FisherIDDateFishingTotal_Catch ,並對列權Weight求和。 另外,我希望將Total_catch列中的值減去列Weight中的值,其結果將保留在名為DIFF的新列中。 而且,我想在DIFF列中顯示高於0.1

這是我的代碼。

df["DIFF"]=df.groupby(["FisherID", "DateFishing", "Total_Catch"]) ["Weight"].sum()-["Total_Catch"]>=0.1

我的數據:

FisherID    DateFishing Total_Catch Weight
1            24-Oct-11      0.9      0.2
1            24-Oct-11      0.9      0.264
1            24-Oct-11      0.9      0.37
2            25-Oct-11      0.7      0.144
2            27-Oct-11      8.2      0.084
2            27-Oct-11      8.2      0.45
3            27-Oct-11      8.2      0.61
3            27-Oct-11      8.2      7
3            29-Oct-11      0.64    0.184

我想你正在尋找一個groupby + transform

df['Sum'] = df.groupby(
    ["FisherID", "DateFishing", "Total_Catch"]
)["Weight"].transform('sum')

然后,通過從Total_Catch減去Weight col來找到Diff

df['Diff'] = (df['Total_Catch'] - df['Weight'])

df

   FisherID DateFishing  Total_Catch  Weight    Sum   Diff
0         1   24-Oct-11         0.90   0.200  0.834  0.700
1         1   24-Oct-11         0.90   0.264  0.834  0.636
2         1   24-Oct-11         0.90   0.370  0.834  0.530
3         2   25-Oct-11         0.70   0.144  0.144  0.556
4         2   27-Oct-11         8.20   0.084  0.534  8.116
5         2   27-Oct-11         8.20   0.450  0.534  7.750
6         3   27-Oct-11         8.20   0.610  7.610  7.590
7         3   27-Oct-11         8.20   7.000  7.610  1.200
8         3   29-Oct-11         0.64   0.184  0.184  0.456

或者,如果您嘗試從Total_Catch減去分組的權Weight ,請使用:

df['Diff'] = df["Total_Catch"] -df.groupby(["FisherID", \
                   "DateFishing", "Total_Catch"])["Weight"].transform('sum')

df

   FisherID DateFishing  Total_Catch  Weight   Diff
0         1   24-Oct-11         0.90   0.200  0.066
1         1   24-Oct-11         0.90   0.264  0.066
2         1   24-Oct-11         0.90   0.370  0.066
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

查詢行

本節以第二個選項的結果為基礎。 請注意,所有這些選項都將布爾掩碼應用於數據幀。 如果你想要的只是掩碼,不要將它應用於數據幀。 只需應用條件並打印:

df.Diff > 0.1

0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
8     True
Name: Diff, dtype: bool

如果要提取所有有效行,可以使用幾個選項。

df.query

df.query('Diff > 0.1')

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

boolean indexing

df[df.Diff > 0.1]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.eval

df[df.eval('Diff > 0.1')]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.wheredropna

df.where(df.Diff > 0.1).dropna(how='all')

   FisherID DateFishing  Total_Catch  Weight   Diff
3       2.0   25-Oct-11         0.70   0.144  0.556
4       2.0   27-Oct-11         8.20   0.084  7.666
5       2.0   27-Oct-11         8.20   0.450  7.666
6       3.0   27-Oct-11         8.20   0.610  0.590
7       3.0   27-Oct-11         8.20   7.000  0.590
8       3.0   29-Oct-11         0.64   0.184  0.456

np.wheredf.iloc

df.iloc[np.where(df.Diff > 0.1)[0]]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

請注意,這些結果具有原始df的索引。 如果要重置索引,請使用reset_index

df[df.Diff > 0.1].reset_index(drop=True)

   FisherID DateFishing  Total_Catch  Weight   Diff
0         2   25-Oct-11         0.70   0.144  0.556
1         2   27-Oct-11         8.20   0.084  7.666
2         2   27-Oct-11         8.20   0.450  7.666
3         3   27-Oct-11         8.20   0.610  0.590
4         3   27-Oct-11         8.20   7.000  0.590
5         3   29-Oct-11         0.64   0.184  0.456

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM