简体   繁体   English

Python Pandas 从另一个 dataframe 更新一个 dataframe 值

[英]Python Pandas update a dataframe value from another dataframe

I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.我在 python 中有两个数据框。我想使用来自另一个 dataframe 的匹配值更新第一个 dataframe 中的行。第二个 dataframe 用作覆盖。

Here is an example with same data and code:这是具有相同数据和代码的示例:

DataFrame 1: DataFrame 1:

在此处输入图像描述

DataFrame 2: DataFrame 2:

在此处输入图像描述

I want to update update dataframe 1 based on matching code and name.我想根据匹配的代码和名称更新 update dataframe 1。 In this example Dataframe 1 should be updated as below:在这个例子中 Dataframe 1 应该更新如下:

在此处输入图像描述

Note: Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)注意:代码 =2 且名称 = Company2 的行更新为值 1000(来自 Dataframe 2)

import pandas as pd

data1 = {
         'Code': [1, 2, 3],
         'Name': ['Company1', 'Company2', 'Company3'],
         'Value': [200, 300, 400],

    }
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])

data2 = {
         'Code': [2],
         'Name': ['Company2'],
         'Value': [1000],
    }

df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])

Any pointers or hints?任何指针或提示?

Using DataFrame.update, which aligns on indices ( https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html ):使用 DataFrame.update,它与索引对齐( https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html ):

>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index()  # to recover the initial structure

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

You can using concat + drop_duplicates您可以使用concat + drop_duplicates

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

You can merge the data first and then use numpy.where, here 's how to use numpy.where可以先合并数据,然后使用numpy.where, 这里是如何使用numpy.where

updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

You can align indices and then use combine_first :您可以对齐索引,然后使用combine_first

res = df2.set_index(['Code', 'Name'])\
         .combine_first(df1.set_index(['Code', 'Name']))\
         .reset_index()

print(res)

#    Code      Name   Value
# 0     1  Company1   200.0
# 1     2  Company2  1000.0
# 2     3  Company3   400.0

There is a update function available有可用的更新功能

example:例子:

df1.update(df2)

for more info:更多信息:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html

You can use pd.Series.where on the result of left-joining df1 and df2您可以在左连接df1df2的结果上使用pd.Series.where

merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
    Code    Name    Value
0   1   Company1    200.0
1   2   Company2    1000.0
2   3   Company3    400.0

You can change the line to您可以将行更改为

df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)

in order to return the value to be an integer.为了将值返回为整数。

Assuming company and code are redundant identifiers, you can also do假设companycode是冗余标识符,你也可以这样做

import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()

df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)

#   Code      Name  Value
#0     1  Company1    200
#1     2  Company2   1000
#2     3  Company3    400

There's something I often do.我经常做一件事。

I merge 'left' first:我先合并“左”:

df_merged = pd.merge(df1, df2, how = 'left', on = 'Code')

Pandas will create columns with extension '_x' (for your left dataframe) and '_y' (for your right dataframe) Pandas 将创建扩展名为“_x”(对于您的左侧数据框)和“_y”(对于您的右侧数据框)的列

You want the ones that came from the right.你想要那些来自右边的人。 So just remove any columns with '_x' and rename '_y':因此,只需删除带有“_x”的任何列并重命名“_y”:

for col in df_merged.columns:
    if '_x' in col:
        df_merged .drop(columns = col, inplace = True)
    if '_y' in col:
        new_name = col.strip('_y')
        df_merged .rename(columns = {col : new_name }, inplace=True)
  1. Append the dataset附加数据集
  2. Drop the duplicate by codecode删除重复项
  3. Sort the values对值进行排序
combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')

None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution上述解决方案均不适用于我的特定示例,我认为这源于我的列的 dtype,但我最终还是找到了这个解决方案

indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM