簡體   English   中英

使用pandas DataFrame按行值過濾的列聚合

[英]Column aggregates filtered by row values with pandas DataFrame

有更好(更快)的方法嗎?

我想在某一天找到與該人當天在同一地點的總銷售額:

    day     name    sold    place
0   mon     Ben     2       1
1   mon     Amy     6       0
2   mon     Sue     7       1
3   mon     John    9       0
4   tues    Ben     9       1
5   tues    Amy     4       0
6   tues    Sue     10      1
7   tues    John    5       0
8   wed     Ben     8       0
9   wed     Amy     3       0
10  wed     Sue     10      1
11  wed     John    3       0

結果如下所示:

    day     name    sold    place   sold_at_same_place
0   mon     Ben     2       1       9
1   mon     Amy     6       0       15
2   mon     Sue     7       1       9
3   mon     John    9       0       15
4   tues    Ben     9       1       19
5   tues    Amy     4       0       9
6   tues    Sue     10      1       19
7   tues    John    5       0       9
8   wed     Ben     8       0       14
9   wed     Amy     3       0       14
10  wed     Sue     10      1       10
11  wed     John    3       0       14

如果不清楚,星期一在1 place sold的總量是2 + 7 = 9。 因為本是地方之一,他sold_in_same_place是9艾米的周一sold_at_same_place是15,因為她在place 0。

這就是我想出的:

  1. 獲取每個地方價值的每日總數:

     def sold_by_day_filter(df, col_name, field_value): """ sums sold by day filtering the `col_name` on `field_value` """ subset = pd.DataFrame(df[df[col_name] == field_value]) aggregated_subset = pd.DataFrame( {str(field_value): subset.groupby(['day'])['sold'].sum()} ).reset_index() return aggregated_subset 
  2. 將每個人加入原始數據集:

     for val in df['place'].unique(): df = pd.merge(df, sold_by_day_filter(df,'place', val), on='day') 

    現在數據集看起來像這樣:

      day name sold place 1 0 0 mon Ben 2 1 9 15 1 mon Amy 6 0 9 15 2 mon Sue 7 1 9 15 3 mon John 9 0 9 15 4 tues Ben 9 1 19 9 5 tues Amy 4 0 19 9 6 tues Sue 10 1 19 9 7 tues John 5 0 19 9 8 wed Ben 8 0 10 14 9 wed Amy 3 0 10 14 10 wed Sue 10 1 10 14 11 wed John 3 0 10 14 
  3. 值應用於sold_at_same_place在價值柱基place

     df['sold_at_same_place'] = \\ df.apply( lambda row: row[str(row['place'])], axis = 1) 
  4. 刪除臨時列值('1'和'0'):

     fields_to_drop = [str(field) for field in df['place'].unique()] df.drop(fields_to_drop, axis=1, inplace=True) 

所以這很有效,但我覺得可能有一些簡單的方法可以用Pandas做到這一點。 任何建議表示贊賞!

我認為這是一個使用transform

>>> df["sold_at_same_place"] = df.groupby(["day", "place"])["sold"].transform(sum)
>>> df
     day  name  sold  place  sold_at_same_place
0    mon   Ben     2      1                   9
1    mon   Amy     6      0                  15
2    mon   Sue     7      1                   9
3    mon  John     9      0                  15
4   tues   Ben     9      1                  19
5   tues   Amy     4      0                   9
6   tues   Sue    10      1                  19
7   tues  John     5      0                   9
8    wed   Ben     8      0                  14
9    wed   Amy     3      0                  14
10   wed   Sue    10      1                  10
11   wed  John     3      0                  14

transform獲取groupby結果並將結果廣播回原始索引。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM