简体   繁体   English

如何比较两个不同数据框中的列并保留第一个数据框中的值?

[英]How to compare columns from two different Data Frames and keep the values from the first Data Frame?

I have two dataframes of different sizes.我有两个不同大小的数据框。 They both have four columns: Words, x, y and z.它们都有四列:单词、x、y 和 z。

However, when joining these two dataframes, I want to keep the values of x, y, z of the words that are similar.但是,在加入这两个数据框时,我想保留相似词的 x、y、z 值。 The words that doesn't exist in df1 but exist in df2 are kept. df1 中不存在但 df2 中存在的单词被保留。

I tried to use pd.merge but this will keep the two values and only similar words.我尝试使用pd.merge但这将保留两个值并且只保留相似的单词。 And if I use pd.concat I have to drop similar elements, but will not be from the first data frame.如果我使用pd.concat我必须删除类似的元素,但不会来自第一个数据帧。

Sample样本

df1 = pd.DataFrame({'Words': 
       ['aardvark', 'abalone', 'abandon'],
     'x': [0.999, 0.888, 0.777], 
     'y': [0.999, 0.888, 0.777],
     'z': [0.999, 0.888, 0.777]})

df2 = pd.DataFrame({'Words': 
       ['aaaaahh', 'aardvark', 'abalone', 'abandon', 'zoo', 'zoom', 'zucchini'], 
     'x': [0.199, 0.111, 0.222, 0.333, 0.232, 0.842, 0.945], 
     'y': [0.929, 0.111, 0.222, 0.333, 0.112, 0.62, 0.265],
     'z': [0.993, 0.111, 0.222, 0.333, 0.212, 0.344, 0.745]})

# Expected output
df_res = pd.DataFrame({'Words': 
          ['aaaaahh', 'aardvark', 'abalone', 'abandon', 'zoo', 'zoom', 'zucchini'], 
     'x': [0.199, 0.999, 0.888, 0.777, 0.232, 0.842, 0.945], 
     'y': [0.929, 0.999, 0.888, 0.777, 0.112, 0.62, 0.265],
     'z': [0.993, 0.999, 0.888, 0.777, 0.212, 0.344, 0.745]})

What I tried我试过的

import pandas as pd

# Merge
df_res = pd.merge(df1, df2, on='Word', how='inner')

# Concat
df_concat = pd.concat(objs=[df1, df2], ignore_index=True)
df_concat = pd.drop_duplicates(subset=['Word'], keep=False, ignore_index=True)

# Compare
d_res = d1[(d1['Word'] != d1['Word'])]
ValueError: Can only compare identically-labeled Series objects

You can use df.append to append df1 to df2 , followed by drop_duplicates , with keep='last' , then sort_index and reset_index :您可以使用df.append到 append df1df2 ,然后是drop_duplicates ,使用keep='last' ,然后sort_indexreset_index

>>> (df2.append(df1)
        .drop_duplicates('Words', keep='last')
        .sort_index()
        .reset_index(drop=True))

      Words      x      y      z
0   aaaaahh  0.199  0.929  0.993
1  aardvark  0.999  0.999  0.999
2   abalone  0.888  0.888  0.888
3   abandon  0.777  0.777  0.777
4       zoo  0.232  0.112  0.212
5      zoom  0.842  0.620  0.344
6  zucchini  0.945  0.265  0.745

Maybe less performant than @Sayandip Dutta answer, you could try a right join (or left, depending on the order you put arguments in pd.merge):也许性能不如@Sayandip Dutta 答案,您可以尝试右连接(或左连接,具体取决于您将 arguments 放入 pd.merge 的顺序):

In [4]: res = pd.merge(df1, df2, how='right', on='Words', suffixes=("_1", "_2"))

In [5]: res
Out[6]:
      Words    x_1    y_1    z_1    x_2    y_2    z_2
0  aardvark  0.999  0.999  0.999  0.111  0.111  0.111
1   abalone  0.888  0.888  0.888  0.222  0.222  0.222
2   abandon  0.777  0.777  0.777  0.333  0.333  0.333
3   aaaaahh    NaN    NaN    NaN  0.199  0.929  0.993
4       zoo    NaN    NaN    NaN  0.232  0.112  0.212
5      zoom    NaN    NaN    NaN  0.842  0.620  0.344
6  zucchini    NaN    NaN    NaN  0.945  0.265  0.745

Then you can fillna of x_1, y_1, z_1 with values of x_2, y_2 and z_2.然后你可以用 x_2、 fillna和 z_2 的值填充 x_1、y_1、z_1 的na。

In [8]: res.x_1.fillna(res.x_2, inplace=True)

In [8]: res.y_1.fillna(res.y_2, inplace=True)

In [9]: res.z_1.fillna(res.z_2, inplace=True)

In [10]: df_res = res[["Words", "x_1", "y_1", ,"z_1"]]

In [11]: df_res
Out[11]:
      Words    x_1    y_1    z_1
0  aardvark  0.999  0.999  0.999
1   abalone  0.888  0.888  0.888
2   abandon  0.777  0.777  0.777
3   aaaaahh  0.199  0.929  0.993
4       zoo  0.232  0.112  0.212
5      zoom  0.842  0.620  0.344
6  zucchini  0.945  0.265  0.745

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何比较两个不同数据帧的列并保持公共值 - How to compare columns of two different data frames and keep the common values 比较具有不同列名的两个数据框,并使用来自第二个数据框的列更新第一个数据框 - Compare two data-frames with different column names and update first data-frame with the column from second data-frame 如何比较不同数据框中的两列并替换值 - how to compare two columns in different data frames & replace the values 使用两个条件比较来自两个不同数据框的两列 - Compare two columns from two different data frame with two conditions 比较两个不同的熊猫数据框中的两列值 - compare two columns values in two different pandas data frames 通过将第一个数据帧的一列与第二个数据帧的两列匹配来合并两个数据帧 - Merge two data frames by matching one column from the first data frame with two columns from the second data frame 如何比较两个不同数据框中的两列并计算出现次数 - how to compare two columns in two different data frames and count the occurrences Python:如何合并来自两个不同数据框的特定列 - Python: How to combine specific columns from two different data frames 汇总来自不同数据框的数据框列 - Summing data-frame columns from different data-frames 如何比较python中两个不同数据框的列? - how to compare columns of two different data frames in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM