简体   繁体   English

为 Pandas 数据帧中的每一行搜索和更新值的最快方法

[英]fastest way for searching and updating values for every rows in Pandas dataframe

I have a dataframe consists of transaction id, customer name and money spent, which looks like this:我有一个由交易 ID、客户名称和花费的钱组成的数据框,如下所示:

id |  name      |    amount 
1  | Jennifer   |     598
2  | Jennifer   |     765
3  |  Matt      |     134
4  |  George    |     390
5  |  Jennifer  |     554
6  |  Matt      |     75
7  |  Matt      |     830
8  |  Matt      |     20
9  |  Bob       |     786
10 |  Bob       |     280
11 |  Sam       |     236
12 |  Sam       |     226
13 |  Bob       |     720
14 |  Bob       |     431
15 |  Jennifer  |     802
16 |  Ann       |     668
17 |  Sam       |     376
18 |  Jennifer  |     891
19 |  Ann       |     569
20 |  Jennifer  |     452

Now I want to make a new column called "amount1", which is the amount of money each custom spent last time he made a purchase, and the result will look like this:现在我想创建一个名为“amount1”的新列,这是每个客户上次购买时花费的金额,结果将如下所示:

id  | name     |   amount   |     amount1
1   | Jennifer |    598     |
2   | Jennifer |    765     |      598
3   | Matt     |    134     |
4   | George   |    390     |
5   | Jennifer |    554     |      765
6   | Matt     |    75      |      134
7   | Matt     |    830     |      75
8   | Matt     |    20      |      830
9   | Bob      |    786     |   
10  | Bob      |    280     |      786
11  | Sam      |    236     |   
12  | Sam      |    226     |      236
13  | Bob      |    720     |      786
14  | Bob      |    431     |      720
15  | Jennifer |    802     |      554
16  | Ann      |    668     |   
17  | Sam      |    376     |      226
18  | Jennifer |    891     |      802
19  | Ann      |    569     |      668
20  | Jennifer |    452     |      891

It is just iterating every row and search for all previous purchase record and update 'amount1' with the most recent purchase record.它只是迭代每一行并搜索所有以前的购买记录并使用最新的购买记录更新“amount1”。

I have tried with the code below, but i have about 200k rows of data, and it takes few hours to run.我已经尝试过下面的代码,但我有大约 20 万行数据,运行需要几个小时。 What is the most efficient way of doing this task?执行此任务的最有效方法是什么?

df['amount1'] = np.nan 

for index, row in df.iterrows():

  purchase_id = row['id']
  customer_name = row['name']
  amt = df.query('id<@purchase_id and name==@customer')['amount'].values

  if len(amt)>0:
    df.loc[index,'amount1'] = amt[-1]

Use DataFrameGroupBy.shift :使用DataFrameGroupBy.shift

df['amount1'] = df.groupby(['name'])['amount'].shift()
print (df)
    id      name  amount  amount1
0    1  Jennifer     598      NaN
1    2  Jennifer     765    598.0
2    3      Matt     134      NaN
3    4    George     390      NaN
4    5  Jennifer     554    765.0
5    6      Matt      75    134.0
6    7      Matt     830     75.0
7    8      Matt      20    830.0
8    9       Bob     786      NaN
9   10       Bob     280    786.0
10  11       Sam     236      NaN
11  12       Sam     226    236.0
12  13       Bob     720    280.0
13  14       Bob     431    720.0
14  15  Jennifer     802    554.0
15  16       Ann     668      NaN
16  17       Sam     376    226.0
17  18  Jennifer     891    802.0
18  19       Ann     569    668.0
19  20  Jennifer     452    891.0

If need shift only positive amount values use:如果只需要移动正amount值,请使用:

s = df['amount'].where(df['amount'] > 0)
df['amount1'] = s.groupby(df['name']).shift()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe:基于字典列表更新多行的最快方法 - Pandas Dataframe: fastest way of updating multiple rows based on a list of dictionaries 为列表中的值组合创建熊猫数据框行的最快方法 - fastest way to create pandas dataframe rows for combination of values from lists 向现有熊猫数据框添加行的最快方法 - Fastest way to add rows to existing pandas dataframe 在 pandas dataframe 中加入 coulmn 值的最快方法? - Fastest way to join coulmn values in pandas dataframe? 获取 Pandas Dataframe 中每个特征的值不正确的行百分比的最快方法 - Fastest way to get Percent of rows with incorrect values for each feature in a Pandas Dataframe 删除包含熊猫数据帧同一列中值的子字符串的行的最快方法 - Fastest way to remove rows that contain substrings of values in the same column of a pandas dataframe 根据条件迭代熊猫数据框中行子集的最快方法 - Fastest way to iterate subsets of rows in pandas dataframe based on condition 最快的方法来比较pandas数据帧中的行和上一行以及数百万行 - Fastest way to compare row and previous row in pandas dataframe with millions of rows 选择Pandas数据框中包含值的行的最快方法是什么? - What is the fastest way to select rows that contain a value in a Pandas dataframe? 在 pandas dataframe 中迭代超过 7000 万行的最快方法 - Fastest way to iterate over 70 million rows in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM