簡體   English   中英

合並具有重疊行和不同列的多個數據框

[英]Merging multiple dataframes with overlapping rows and different columns

我有多個帶有一些公共列和一些重疊行的 Pandas 數據框。 我想以這樣一種方式組合它們,即我有一個包含所有列和所有唯一行(重疊/重復行刪除)的最終數據框。 剩下的差距應該是nans。

在此處輸入圖片說明

我想出了下面的功能。 本質上,它會一一遍歷所有列,附加每個數據幀中的所有值,刪除重復項(重疊),並逐列構建新的輸出數據幀。

def combine_dfs(dataframes:list):
    
    ## Identifying all unique columns in all data frames
    columns = []
    for df in dataframes:
        columns.extend(df.columns)
    columns = np.unique(columns)
    
    ## Appending values from each data frame per column
    output_df = pd.DataFrame()
    for col in columns:
        column = pd.Series(dtype="object", name=col)
        for df in dataframes:
            if col in df.columns:
                column = column.append(df[col])
        
        ## Removing overlapping data (assuming consistent values)
        column = column[~column.index.duplicated()]
        
        ## Adding column to output data frame
        column = pd.DataFrame(column)
        output_df = pd.concat([output_df,column], axis=1)
    
    output_df.sort_index(inplace=True)
    return output_df

df_1 = pd.DataFrame([[10,20,30],[11,21,31],[12,22,32],[13,23,33]], columns=["A","B","C"])
df_2 = pd.DataFrame([[33,43,54],[34,44,54],[35,45,55],[36,46,56]], columns=["C","D","E"], index=[3,4,5,6])
df_3 = pd.DataFrame([[50,60],[51,61],[52,62],[53,63],[54,64]], columns=["E","F"])

print(combine_dfs([df_1,df_2,df_3]))

正如可視化中的預期,輸出如下所示:

      A     B   C     D   E     F
0  10.0  20.0  30   NaN  50  60.0
1  11.0  21.0  31   NaN  51  61.0
2  12.0  22.0  32   NaN  52  62.0
3  13.0  23.0  33  43.0  54  63.0
4   NaN   NaN  34  44.0  54  64.0
5   NaN   NaN  35  45.0  55   NaN
6   NaN   NaN  36  46.0  56   NaN

這種方法適用於小數據集。 有沒有辦法優化這個?

IIUC 你可以鏈接combine_first

print (df_1.combine_first(df_2).combine_first(df_3))

      A     B   C     D     E     F
0  10.0  20.0  30   NaN  50.0  60.0
1  11.0  21.0  31   NaN  51.0  61.0
2  12.0  22.0  32   NaN  52.0  62.0
3  13.0  23.0  33  43.0  54.0  63.0
4   NaN   NaN  34  44.0  54.0  64.0
5   NaN   NaN  35  45.0  55.0   NaN
6   NaN   NaN  36  46.0  56.0   NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM