简体   繁体   English

用另一个数据框的值替换熊猫数据框的多个值的最快方法

[英]Fastest way to replace multiple values of a pandas dataframe with values from another dataframe

I am trying to replace multiple rows of pandas dataframe, with values from another dataframe. 我正在尝试用来自另一个数据框的值替换多行熊猫数据框。

Supposed I have 10,000 rows of customer_id in my dataframe df1 and I want to replace these customer_id with 3,000 values from df2 . 假设我的数据df1有10,000行的customer_id,并且我想用df2 3,000个值替换这些customer_id。

For the sake of illustration, let's generate the dataframes (below). 为了说明起见,让我们生成数据帧(如下)。

Say these 10 rows in df1 represent 10,000 rows, and the 3 rows from df2 represent 3,000 values. 假设df1这10行代表10,000行,而df2的3行代表3,000个值。

import numpy as np
import pandas as pd
np.random.seed(42)

# Create df1 with unique values
arr1 = np.arange(100,200,10)
np.random.shuffle(arr1)
df1 = pd.DataFrame(data=arr1, 
                   columns=['customer_id'])

# Create df2 for new unique_values
df2 = pd.DataFrame(data = [1800, 1100, 1500],
                   index = [180, 110, 150], # this is customer_id column on df1
                   columns = ['customer_id_new'])

I want to replace 180 with 1800, 110 with 1100, and 150 with 1500. 我想用1800替换180,用1100替换110,用1500替换150。

I know we can do below ... 我知道我们可以在下面做...

# Replace multiple values
replace_values = {180 : 1800, 110 : 1100, 150 : 1500 }                                                                                          
df1_replaced = df1.replace({'customer_id': replace_values})

And it works fine if I only have a few lines... 如果我只有几行,它就可以正常工作...

But if I have thousands of lines that I need to replace, how could I do this without typing out what values I want to change one at a time? 但是,如果我有成千上万的行需要替换,该如何执行而又不输入要一次更改的值呢?

EDIT: To clarify, I don't need to use replace . 编辑:澄清一下,我不需要使用replace Anything that could replace those values in df1 from values in df2 in the fastest most efficient way is ok. 可以最快最有效的方式从df2中的值替换df1中的那些值的任何事情都是可以的。

df1['customer_id'] = df1['customer_id'].replace(df2['customer_id_new'])

或者,您可以就地进行。

df1['customer_id'].replace(df2['customer_id_new'], inplace=True)

You can try this, using map with a pd.Series: 您可以尝试使用map和pd.Series:

 df1['customer_id'] = df1['customer_id'].map(df2.squeeze()).fillna(df1['customer_id'])

or 要么

df1['customer_id'] = df1['customer_id'].map(df2['customer_id_new']).fillna(df1['customer_id'])

Output: 输出:

   customer_id
0       1800.0
1       1100.0
2       1500.0
3        100.0
4        170.0
5        120.0
6        190.0
7        140.0
8        130.0
9        160.0

Going with your original method using replace , you can simplify it with to_dict to create your mapping dictionary without having to do it manually: 使用replace原始方法,可以使用to_dict简化它,以创建映射字典,而无需手动执行:

df1["customer_id"] = df1["customer_id"].replace(df2["customer_id_new"].to_dict())

>>> df1
   customer_id
0         1800
1         1100
2         1500
3          100
4          170
5          120
6          190
7          140
8          130
9          160

In my opinion, apart from trying out useful answers mentioned above, you may try parallelising your data-frame in-case you have multi-core processor. 我认为,除了尝试上述有用的答案外,如果您有多核处理器,则可以尝试并行化数据帧。

For example: 例如:

import pandas as pd, numpy as np, seaborn as sns
from multiprocessing import Pool

num_partitions = 10 #number of partitions to split data-frame
num_cores = 4 #number of cores on your machine

iris = pd.DataFrame(sns.load_dataset('iris'))
def parallelize_dataframe(df, func):
   df_split = np.array_split(df, num_partitions)
   pool = Pool(num_cores)
   df = pd.concat(pool.map(func, df_split))
   pool.close()
   pool.join()
   return df

In place of 'func' parameter, you may pass your replace method. 您可以通过replace方法来代替'func'参数。 Please let me know if it helps. 请告诉我是否有帮助。 In case of any error, do comment. 如有任何错误,请发表评论。

Thanks! 谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从另一个DataFrame替换pandas.DataFrame中的值的优雅方法 - Elegant way to replace values in pandas.DataFrame from another DataFrame 使用Pandas将数据框中的值替换为另一个数据框 - Replace values in dataframe from another dataframe with Pandas 根据另一个 pandas Z6A8064B5DF479C550570 的值填充一个 pandas dataframe 的最快方法是什么? - What is the fastest way to populate one pandas dataframe based on values from another pandas dataframe? pandas DataFrame:用另一个值替换多个列中的值 - pandas DataFrame: replace values in multiple columns with the value from another 替换 pandas dataframe 中的值 - Replace values in a pandas dataframe from another 为列表中的值组合创建熊猫数据框行的最快方法 - fastest way to create pandas dataframe rows for combination of values from lists 如何用另一个 dataframe 替换 pandas dataframe 中的值 - How to replace values in pandas dataframe with another dataframe 通过从另一个数据帧中查找来替换 Pandas 数据帧中的所有值 - Replace all values in pandas dataframe by lookup from another dataframe 使用 Pandas 将数据框中的值替换为另一个数据框中的值 - Replace values from a dataframe with values from another with Pandas 基于另一个数据框 python pandas 替换列值 - 更好的方法? - Replace column values based on another dataframe python pandas - better way?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM