简体   繁体   English

如何合并 Pandas 中的两个不同大小的 DataFrame 以更新一个 dataframe 取决于一列中的部分值与另一列 dataframe 的匹配

[英]How to merge two different size DataFrames in Pandas to update one dataframe depends on matching partial values in one column with another dataframe

I am trying to code a task for work, so I made a simple case that simulates this task.我正在尝试为工作编写任务代码,因此我制作了一个模拟此任务的简单案例。 I have two dataframes: data_1 and data_2 , and I would like to replace some rows in data_1 with rows from data_2 by condition of matching values in a column named time .我有两个数据帧: data_1data_2 ,我想通过匹配名为time的列中的值的条件,用data_1中的行替换data_2中的一些行。

Here is an example:这是一个例子:

import numpy as np
import pandas as pd
a = {
    'time':[1,2,3,4,5,6],
    'column_1':[2,2,2,2,2,2],
    'column_2':[3,3,3,3,3,3]   
}
b = {
    'time':[3,4,5],
    'column_1':[0,0,0],
    'column_2':[0,0,0]    
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)

In the result, I would like to get dataframe like this:结果,我想这样得到 dataframe:

time   column_1   column_2
0   1   2   3
1   2   2   3
2   3   0   0
3   4   0   0
4   5   0   0
5   6   2   3

I tried merge and replace methods in Pandas, but it was not successful.我尝试了Pandas中的merge和replace方法,但是没有成功。 I did mask boolean array:我做了掩码 boolean 阵列:

time_1 = list(data_1['time'])
time_2 = list(data_2['time'])
mask_array = np.zeros(len(time_1),dtype = bool)
for i, item in enumerate(time_1):
    if item in time_2:
        mask_array[i] = True

and I received:我收到了:

array([False, False,  True,  True,  True, False])

But I could not replace data_1 values with data_2 values.但我无法用data_1值替换data_2值。 What did I do wrong?我做错什么了? It does not seem like a difficult task, but I could not find anything useful and just do not know what to do.这似乎不是一项艰巨的任务,但我找不到任何有用的东西,只是不知道该怎么做。 I do not have a lot of experience with pandas, so maybe I do not understand something.我对 pandas 没有太多经验,所以也许我不明白某些事情。

You can use .update() after setting index on time on both data_1a and data_1b , as follows:您可以在data_1adata_1b上按time设置索引后使用.update() ,如下所示:

data_1a = data_1.set_index('time')
data_1a.update(data_2.set_index('time'))
data_out = data_1a.reset_index()

.update() modifies in place using non-NA values from another DataFrame. Aligns on indices. .update()使用来自另一个 DataFrame 的非 NA 值就地修改。对齐索引。 Thus, when you set time as index on both data_1a and data_1b , .update() aligns on matching values in column time to perform the update of data_1 by corresponding values of data_2 .因此,当您将time设置为data_1adata_1b的索引时, .update()对齐列time中的匹配值,以通过data_1的相应值执行data_2的更新。

Data Setup:数据设置:

a = {
    'time':[1,2,3,4,5,6],
    'column_1':[2,2,2,2,2,2],
    'column_2':[3,3,3,3,3,3]   
}
b = {
    'time':[3,4,5],
    'column_1':[0,0,0],
    'column_2':[0,0,0]    
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)

Result:结果:

print(data_out)

   time  column_1  column_2
0     1       2.0       3.0
1     2       2.0       3.0
2     3       0.0       0.0
3     4       0.0       0.0
4     5       0.0       0.0
5     6       2.0       3.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 合并数据帧并仅从第二个 Dataframe 更新一列 - Pandas merge dataframes and update Only one Column from Second Dataframe Pandas DataFrame在两个值之间合并而不是匹配一个 - Pandas DataFrame merge between two values instead of matching one pandas合并两个数据帧,其中一个包含另一个数据帧 - pandas merge two dataframes with one contains column values in another 如果一个字符串列包含在 Pandas 的另一列中,则合并两个数据框 - Merge two dataframe if one string column is contained in another column in Pandas 如何在一个 dataframe 中合并基于不同列的两个数据框? - How to merge two dataframes based on different columns in one dataframe? 如何将两个数据帧中的一列连接到另一个数据帧? pd.merge 返回 nan - How to join one column from two dataframes to another dataframe ? pd.merge returns nan Pandas:在一列上合并两个不同大小的数据框 - Pandas: merge two dataframe of different sizes on one column 如何使用熊猫中另一个数据框的值更新一个数据框 - How to update one dataframe using values from another dataframe in pandas Pandas:用于匹配行索引 - 使用不同列大小的其他 dataframe 的值更新 dataframe 值 - Pandas: for matching row indices - update dataframe values with values from other dataframe with a different column size 如何根据一列中的唯一值将熊猫数据帧划分为不同的数据帧并对其进行迭代? - how to divide pandas dataframe into different dataframes based on unique values from one column and itterate over that?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM