[英]Pandas Dataframe: fastest way of updating multiple rows based on a list of dictionaries
[英]fastest way for searching and updating values for every rows in Pandas dataframe
我有一个由交易 ID、客户名称和花费的钱组成的数据框,如下所示:
id | name | amount
1 | Jennifer | 598
2 | Jennifer | 765
3 | Matt | 134
4 | George | 390
5 | Jennifer | 554
6 | Matt | 75
7 | Matt | 830
8 | Matt | 20
9 | Bob | 786
10 | Bob | 280
11 | Sam | 236
12 | Sam | 226
13 | Bob | 720
14 | Bob | 431
15 | Jennifer | 802
16 | Ann | 668
17 | Sam | 376
18 | Jennifer | 891
19 | Ann | 569
20 | Jennifer | 452
现在我想创建一个名为“amount1”的新列,这是每个客户上次购买时花费的金额,结果将如下所示:
id | name | amount | amount1
1 | Jennifer | 598 |
2 | Jennifer | 765 | 598
3 | Matt | 134 |
4 | George | 390 |
5 | Jennifer | 554 | 765
6 | Matt | 75 | 134
7 | Matt | 830 | 75
8 | Matt | 20 | 830
9 | Bob | 786 |
10 | Bob | 280 | 786
11 | Sam | 236 |
12 | Sam | 226 | 236
13 | Bob | 720 | 786
14 | Bob | 431 | 720
15 | Jennifer | 802 | 554
16 | Ann | 668 |
17 | Sam | 376 | 226
18 | Jennifer | 891 | 802
19 | Ann | 569 | 668
20 | Jennifer | 452 | 891
它只是迭代每一行并搜索所有以前的购买记录并使用最新的购买记录更新“amount1”。
我已经尝试过下面的代码,但我有大约 20 万行数据,运行需要几个小时。 执行此任务的最有效方法是什么?
df['amount1'] = np.nan
for index, row in df.iterrows():
purchase_id = row['id']
customer_name = row['name']
amt = df.query('id<@purchase_id and name==@customer')['amount'].values
if len(amt)>0:
df.loc[index,'amount1'] = amt[-1]
df['amount1'] = df.groupby(['name'])['amount'].shift()
print (df)
id name amount amount1
0 1 Jennifer 598 NaN
1 2 Jennifer 765 598.0
2 3 Matt 134 NaN
3 4 George 390 NaN
4 5 Jennifer 554 765.0
5 6 Matt 75 134.0
6 7 Matt 830 75.0
7 8 Matt 20 830.0
8 9 Bob 786 NaN
9 10 Bob 280 786.0
10 11 Sam 236 NaN
11 12 Sam 226 236.0
12 13 Bob 720 280.0
13 14 Bob 431 720.0
14 15 Jennifer 802 554.0
15 16 Ann 668 NaN
16 17 Sam 376 226.0
17 18 Jennifer 891 802.0
18 19 Ann 569 668.0
19 20 Jennifer 452 891.0
如果只需要移动正amount
值,请使用:
s = df['amount'].where(df['amount'] > 0)
df['amount1'] = s.groupby(df['name']).shift()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.