[英]Replace pandas dataframe columns with another dataframe based on specific column
I have two dataframes with many columns df1, df2, and i want to replace all df1 values (except the time columns) with the data from df2 columns where the time values is the same:我有两个包含许多列 df1、df2 的数据框,我想用时间值相同的 df2 列中的数据替换所有 df1 值(时间列除外):
df1:
index time x y ......many other columns ( the same as df2)
0 1 1 1
1 1.1 2 2
2 1.1 3 3
3 1.1 4 4
4 1.4 5 5
5 1.5 6 6
6 1.5 7 7
df2:
index time x y ....many other columns (the same as df1)
0 1 10 10
1 1.1 11 11
2 1.2 12 12
3 1.3 13 13
4 1.4 14 14
5 1.5 15 15
6 1.6 16 16
the result for df1 should be:
index time x y ....many other columns
0 1 10 10
1 1.1 11 11
2 1.1 11 11
3 1.1 11 11
4 1.4 14 14
5 1.5 15 15
6 1.5 15 15
You need to merge:你需要合并:
df1 = df1.merge(df2, left_index = True, right_index = True)
then you need to remove the columns you do not need那么您需要删除不需要的列
Edit: Misread the question the first time.编辑:第一次误读问题。 This should help:
这应该有助于:
df1[['time']].merge(df2, on='time')
I think I was able to get my thinking in order and hopefully have reached a solution that will work for you.我想我能够让我的想法井然有序,并希望能找到一个适合你的解决方案。
Try this, you can get your answer with using combine_first
, and doing some tweaking:试试这个,你可以通过使用
combine_first
得到你的答案,并做一些调整:
combine_first
fills null values from another dataframe
, so first you can replace all values (except in 'time' column) with np.nan
. combine_first
从另一个dataframe
填充 null 值,因此首先您可以用np.nan
替换所有值(“时间”列除外)。 Note that I use 'time' column as the index
.请注意,我使用“时间”列作为
index
。
As combine_first
will return the union of the two dataframes, you can use isin
to get only the time values from df1
in your final output.由于
combine_first
将返回两个数据帧的并集,因此您可以使用isin
仅从最终 output 中的df1
获取时间值。
import numpy as np
import pandas as pd
df1[df1.columns.difference(['time'])] = np.nan
res = df1.set_index('time').combine_first(df2.set_index('time')).reset_index()
li = [i for i in df1['time'].unique()]
final= res[res['time'].isin(li)]
Which will get you:这会让你:
time x y
0 1.0 10.0 10.0
1 1.1 11.0 11.0
2 1.1 11.0 11.0
3 1.1 11.0 11.0
6 1.4 14.0 14.0
7 1.5 15.0 15.0
8 1.5 15.0 15.0
Try it on your actual dataset, and let me know if it works.在您的实际数据集上尝试一下,让我知道它是否有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.