合并熊猫数据框以填补空白

Question

Have been struggling with this for a bit today.今天一直在为此苦苦挣扎。 I've got a master dataframe that is missing some values, and a secondary one that has these values which I would like to add in. The key to match on is column 1.我有一个缺少一些值的主数据框，以及一个包含我想添加的这些值的辅助数据框。要匹配的关键是第 1 列。

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1
       1  2
0   Test  A
1  Test1  B
2  Test2  C

df2
           1     2     3
0  Something     z  Blah
1       Test  None  Blah
2      Test1  None  Blah
3      Test2  None  Blah
4      Test3     x  Blah
5      Test4     y  Blah

The outcome I'm looking for is:我正在寻找的结果是：

           1     2     3
0  Something     z  Blah
1       Test     A  Blah
2      Test1     B  Blah
3      Test2     C  Blah
4      Test3     x  Blah
5      Test4     y  Blah

Any ideas?有任何想法吗？

Answer 1

You can use a map and fillna :您可以使用map和fillna ：

df2[2] = df2[2].fillna(df2[1].map(df1.set_index(1)[2]))

Output:输出：

          1  2     3
0  Something  z  Blah
1       Test  A  Blah
2      Test1  B  Blah
3      Test2  C  Blah
4      Test3  x  Blah
5      Test4  y  Blah

Answer 2

usr from this code:来自此代码的 usr：

import pandas as pd
df = pd.merge(df2, df1, on='1', how='left')
for i in df.index:
    if df['2_x'][i] is None:
        df['2_x'][i]=df['2_y'][i]

then you can remove extra column from your dataframe然后您可以从数据框中删除额外的列

Answer 3

You can use a pd.merge and np.where() to accomplish this您可以使用pd.merge和np.where()来完成此操作

import pandas as pd
import numpy as np

df_merge = pd.merge(df2, df1, how = 'left', left_on = 1, right_on = 1, suffixes=('', '_y'))
df_merge['2'] = np.where(df_merge['2'].isna(), df_merge['2_y'], df_merge['2'])
df_merge = df_merge[[1, '2', 3]]
df_merge

Answer 4

Here is one way about it这是一种方法

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
df3['2'] = np.where(df3['2'].isna(), df3['2_y'], df3['2'])
df3.drop(columns='2_y')

OR或者

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
idx = df3[df3['2'].isnull() == True].index
df3.iloc[idx, 1]  = df3.iloc[idx,3]
df3.drop(columns='2_y')

         1      2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

Answer 5

Using pandas apply to multiple columns (ref: Pandas Tricks — Pass Multiple Columns To Lambda | Medium )使用 pandas apply多列（参考： Pandas Tricks — Pass Multiple Columns To Lambda | Medium ）

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1_dict = {k:v for k,v in df1.values}
df2_new = df2.copy()
df2_new[2] = df2_new.apply(lambda x : df1_dict.get(x[1]) if not x[2] else x[2], axis=1)
df2_new 

    1   2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

合并熊猫数据框以填补空白

问题描述

5 个解决方案

解决方案1
2 已采纳 2022-06-21 17:08:44

解决方案2
0 2022-06-21 16:44:51

解决方案3
0 2022-06-21 16:47:21

解决方案4
0 2022-06-21 17:08:35

解决方案5
0 2022-06-21 18:55:50

合并熊猫数据框以填补空白

问题描述

5 个解决方案

解决方案1 2 已采纳 2022-06-21 17:08:44

解决方案2 0 2022-06-21 16:44:51

解决方案3 0 2022-06-21 16:47:21

解决方案4 0 2022-06-21 17:08:35

解决方案5 0 2022-06-21 18:55:50

解决方案1
2 已采纳 2022-06-21 17:08:44

解决方案2
0 2022-06-21 16:44:51

解决方案3
0 2022-06-21 16:47:21

解决方案4
0 2022-06-21 17:08:35

解决方案5
0 2022-06-21 18:55:50