简体   繁体   English

合并熊猫数据框以填补空白

[英]Merging pandas dataframes to fill in the gaps

Have been struggling with this for a bit today.今天一直在为此苦苦挣扎。 I've got a master dataframe that is missing some values, and a secondary one that has these values which I would like to add in. The key to match on is column 1.我有一个缺少一些值的主数据框,以及一个包含我想添加的这些值的辅助数据框。要匹配的关键是第 1 列。

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1
       1  2
0   Test  A
1  Test1  B
2  Test2  C

df2
           1     2     3
0  Something     z  Blah
1       Test  None  Blah
2      Test1  None  Blah
3      Test2  None  Blah
4      Test3     x  Blah
5      Test4     y  Blah


The outcome I'm looking for is:我正在寻找的结果是:

           1     2     3
0  Something     z  Blah
1       Test     A  Blah
2      Test1     B  Blah
3      Test2     C  Blah
4      Test3     x  Blah
5      Test4     y  Blah

Any ideas?有任何想法吗?

You can use a map and fillna :您可以使用mapfillna

df2[2] = df2[2].fillna(df2[1].map(df1.set_index(1)[2]))

Output:输出:

          1  2     3
0  Something  z  Blah
1       Test  A  Blah
2      Test1  B  Blah
3      Test2  C  Blah
4      Test3  x  Blah
5      Test4  y  Blah

usr from this code:来自此代码的 usr:

import pandas as pd
df = pd.merge(df2, df1, on='1', how='left')
for i in df.index:
    if df['2_x'][i] is None:
        df['2_x'][i]=df['2_y'][i]

then you can remove extra column from your dataframe然后您可以从数据框中删除额外的列

You can use a pd.merge and np.where() to accomplish this您可以使用pd.mergenp.where()来完成此操作

import pandas as pd
import numpy as np

df_merge = pd.merge(df2, df1, how = 'left', left_on = 1, right_on = 1, suffixes=('', '_y'))
df_merge['2'] = np.where(df_merge['2'].isna(), df_merge['2_y'], df_merge['2'])
df_merge = df_merge[[1, '2', 3]]
df_merge

Here is one way about it这是一种方法

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
df3['2'] = np.where(df3['2'].isna(), df3['2_y'], df3['2'])
df3.drop(columns='2_y')

OR或者

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
idx = df3[df3['2'].isnull() == True].index
df3.iloc[idx, 1]  = df3.iloc[idx,3]
df3.drop(columns='2_y')
         1      2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

Using pandas apply to multiple columns (ref: Pandas Tricks — Pass Multiple Columns To Lambda | Medium )使用 pandas apply多列(参考: Pandas Tricks — Pass Multiple Columns To Lambda | Medium

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1_dict = {k:v for k,v in df1.values}
df2_new = df2.copy()
df2_new[2] = df2_new.apply(lambda x : df1_dict.get(x[1]) if not x[2] else x[2], axis=1)
df2_new 

    1   2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM