簡體   English   中英

合並熊貓數據框以填補空白

[英]Merging pandas dataframes to fill in the gaps

今天一直在為此苦苦掙扎。 我有一個缺少一些值的主數據框,以及一個包含我想添加的這些值的輔助數據框。要匹配的關鍵是第 1 列。

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1
       1  2
0   Test  A
1  Test1  B
2  Test2  C

df2
           1     2     3
0  Something     z  Blah
1       Test  None  Blah
2      Test1  None  Blah
3      Test2  None  Blah
4      Test3     x  Blah
5      Test4     y  Blah


我正在尋找的結果是:

           1     2     3
0  Something     z  Blah
1       Test     A  Blah
2      Test1     B  Blah
3      Test2     C  Blah
4      Test3     x  Blah
5      Test4     y  Blah

有任何想法嗎?

您可以使用mapfillna

df2[2] = df2[2].fillna(df2[1].map(df1.set_index(1)[2]))

輸出:

          1  2     3
0  Something  z  Blah
1       Test  A  Blah
2      Test1  B  Blah
3      Test2  C  Blah
4      Test3  x  Blah
5      Test4  y  Blah

來自此代碼的 usr:

import pandas as pd
df = pd.merge(df2, df1, on='1', how='left')
for i in df.index:
    if df['2_x'][i] is None:
        df['2_x'][i]=df['2_y'][i]

然后您可以從數據框中刪除額外的列

您可以使用pd.mergenp.where()來完成此操作

import pandas as pd
import numpy as np

df_merge = pd.merge(df2, df1, how = 'left', left_on = 1, right_on = 1, suffixes=('', '_y'))
df_merge['2'] = np.where(df_merge['2'].isna(), df_merge['2_y'], df_merge['2'])
df_merge = df_merge[[1, '2', 3]]
df_merge

這是一種方法

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
df3['2'] = np.where(df3['2'].isna(), df3['2_y'], df3['2'])
df3.drop(columns='2_y')

或者

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
idx = df3[df3['2'].isnull() == True].index
df3.iloc[idx, 1]  = df3.iloc[idx,3]
df3.drop(columns='2_y')
         1      2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

使用 pandas apply多列(參考: Pandas Tricks — Pass Multiple Columns To Lambda | Medium

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1_dict = {k:v for k,v in df1.values}
df2_new = df2.copy()
df2_new[2] = df2_new.apply(lambda x : df1_dict.get(x[1]) if not x[2] else x[2], axis=1)
df2_new 

    1   2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM