简体   繁体   English

根据特定列将 pandas dataframe 列替换为另一个 dataframe

[英]Replace pandas dataframe columns with another dataframe based on specific column

I have two dataframes with many columns df1, df2, and i want to replace all df1 values (except the time columns) with the data from df2 columns where the time values is the same:我有两个包含许多列 df1、df2 的数据框,我想用时间值相同的 df2 列中的数据替换所有 df1 值(时间列除外):

df1:

index time   x y   ......many other columns ( the same as df2)
0       1    1 1
1       1.1  2 2
2       1.1  3 3
3       1.1  4 4
4       1.4  5 5
5       1.5  6 6
6       1.5  7 7


df2:

index time  x   y   ....many other columns (the same as df1)
0       1   10  10
1       1.1 11  11
2       1.2 12  12
3       1.3 13  13
4       1.4 14  14
5       1.5 15  15
6       1.6 16  16



the result for df1 should be:

index time  x   y   ....many other columns 
0       1    10 10
1       1.1  11 11
2       1.1  11 11
3       1.1  11 11
4       1.4  14 14
5       1.5  15 15
6       1.5  15 15


You need to merge:你需要合并:

df1 = df1.merge(df2, left_index = True, right_index = True)

then you need to remove the columns you do not need那么您需要删除不需要的列

Edit: Misread the question the first time.编辑:第一次误读问题。 This should help:这应该有助于:

df1[['time']].merge(df2, on='time')

I think I was able to get my thinking in order and hopefully have reached a solution that will work for you.我想我能够让我的想法井然有序,并希望能找到一个适合你的解决方案。

Try this, you can get your answer with using combine_first , and doing some tweaking:试试这个,你可以通过使用combine_first得到你的答案,并做一些调整:

  1. combine_first fills null values from another dataframe , so first you can replace all values (except in 'time' column) with np.nan . combine_first从另一个dataframe填充 null 值,因此首先您可以用np.nan替换所有值(“时间”列除外)。 Note that I use 'time' column as the index .请注意,我使用“时间”列作为index

  2. As combine_first will return the union of the two dataframes, you can use isin to get only the time values from df1 in your final output.由于combine_first将返回两个数据帧的并集,因此您可以使用isin仅从最终 output 中的df1获取时间值。

import numpy as np
import pandas as pd

df1[df1.columns.difference(['time'])] = np.nan
res = df1.set_index('time').combine_first(df2.set_index('time')).reset_index()
li = [i for i in df1['time'].unique()]

final= res[res['time'].isin(li)]

Which will get you:这会让你:

   time     x     y
0   1.0  10.0  10.0
1   1.1  11.0  11.0
2   1.1  11.0  11.0
3   1.1  11.0  11.0
6   1.4  14.0  14.0
7   1.5  15.0  15.0
8   1.5  15.0  15.0

Try it on your actual dataset, and let me know if it works.在您的实际数据集上尝试一下,让我知道它是否有效。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 根据另一个数据框中的匹配列填充新的数据框列 - Pandas populate new dataframe column based on matching columns in another dataframe 使用 Pandas 将特定列值替换为另一个数据框列值 - Replace specific column values with another dataframe column value using Pandas 根据另一列的值替换Pandas数据框的Column的值 - Replace values of a Pandas dataframe's Column based on values of another column 根据 pandas Dataframe 中的多列替换多列值 - Replace multiple column value based on multiple columns in pandas Dataframe 根据条件,用相应的列名替换 pandas 数据框中的特定值, - Replace specific values in pandas dataframe with the corresponding column name, based on a condition, 根据 pandas dataframe 中的相邻列将 NaN 值替换为特定文本 - Replace NaN values with specific text based on adjacent column in pandas dataframe 将列添加到 DataFrame 中,特定列的差异基于另一列的值 - Add columns to DataFrame with difference of specific columns based on values of another column Pandas数据框:根据另一列中的值替换多行 - Pandas dataframe: Replace multiple rows based on values in another column 基于另一个数据框 python pandas 替换列值 - 更好的方法? - Replace column values based on another dataframe python pandas - better way? Pandas-根据特定列的值在DataFrame中创建单独的列 - Pandas - Create Separate Columns in DataFrame Based on a Specific Column's Values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM