合並多個 DataFrames Pandas

Question

這可能被認為是對各種方法的徹底解釋的重復，但是由於數據幀數量較多，我似乎無法在那里找到解決我的問題的方法。

我有多個數據幀（超過 10 個），每個數據幀在一列VARX不同。 這只是一個快速且過於簡單的例子：

import pandas as pd

df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
       'VAR1': [38.196202, 38.198002, 38.200001],
       'profile': ['profile_1', 'profile_1','profile_1']})

df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
       'VAR2': [0.20440, 0.20442, 0.20446],
       'profile': ['profile_1', 'profile_1','profile_1']})

df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
       'VAR3': [15.1880, 15.1820, 15.1820],
       'profile': ['profile_1', 'profile_1','profile_1']})

對於相同的配置文件，每個df具有相同或不同的深度，因此

我需要創建一個新的 DataFrame 來合並所有單獨的數據幀，其中操作的關鍵列是depth和profile ，每個配置文件都出現深度值。

因此， VARX值應為NaN ，其中該剖面的該變量沒有深度測量值。

結果應該是一個新的壓縮數據幀，其中所有VARX作為depth和profile的附加列，如下所示：

name_profile    depth   VAR1        VAR2        VAR3
profile_1   0.500000    38.196202   NaN         NaN
profile_1   0.600000    38.198002   0.20440     NaN
profile_1   1.100000    NaN         0.20442     NaN
profile_1   1.200000    NaN         0.20446     15.1880
profile_1   1.300000    38.200001   NaN         15.1820
profile_1   1.400000    NaN         NaN         15.1820

請注意，配置文件的實際數量要大得多。

有任何想法嗎？

Answer 1

考慮在每個數據幀上設置索引，然后使用pd.concat運行水平合並：

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
#      profile  depth       VAR1     VAR2    VAR3
# 0  profile_1    0.5  38.198002      NaN     NaN
# 1  profile_1    0.6  38.198002  0.20440     NaN
# 2  profile_1    1.1        NaN  0.20442     NaN
# 3  profile_1    1.2        NaN  0.20446  15.188
# 4  profile_1    1.3  38.200001      NaN  15.182
# 5  profile_1    1.4        NaN      NaN  15.182

Answer 2

一個簡單的方法是結合使用functools.partial / reduce 。

首先partial允許“凍結”函數參數和/或關鍵字的某些部分，從而產生具有簡化簽名的新對象。 然后使用reduce我們可以將新的部分對象累積應用於可迭代的項目（此處為數據幀列表）：

from functools import partial, reduce

dfs = [df1, df2, df3]
merge = partial(pd.merge, on=['depth', 'profile'], how='outer')
reduce(merge, dfs)

   depth       VAR1    profile     VAR2    VAR3
0    0.6  38.198002  profile_1  0.20440     NaN
1    0.6  38.198002  profile_1  0.20440     NaN
2    1.3  38.200001  profile_1      NaN  15.182
3    1.1        NaN  profile_1  0.20442     NaN
4    1.2        NaN  profile_1  0.20446  15.188
5    1.4        NaN  profile_1      NaN  15.182

Answer 3

我會使用附加。

>>> df1.append(df2).append(df3).sort_values('depth')

        VAR1     VAR2    VAR3  depth    profile
0  38.196202      NaN     NaN    0.5  profile_1
1  38.198002      NaN     NaN    0.6  profile_1
0        NaN  0.20440     NaN    0.6  profile_1
1        NaN  0.20442     NaN    1.1  profile_1
2        NaN  0.20446     NaN    1.2  profile_1
0        NaN      NaN  15.188    1.2  profile_1
2  38.200001      NaN     NaN    1.3  profile_1
1        NaN      NaN  15.182    1.3  profile_1
2        NaN      NaN  15.182    1.4  profile_1

顯然，如果您有很多數據框，只需創建一個列表並遍歷它們即可。

Answer 4

為什么不連接所有數據幀，融合，然后使用您的 id 對其進行改造？ 可能有更有效的方法來做到這一點，但這是有效的。

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

df_pivot將在哪里

variable              VAR1     VAR2    VAR3
profile   depth                            
profile_1 0.5    38.196202      NaN     NaN
          0.6    38.198002  0.20440     NaN
          1.1          NaN  0.20442     NaN
          1.2          NaN  0.20446  15.188
          1.3    38.200001      NaN  15.182
          1.4          NaN      NaN  15.182

Answer 5

您還可以使用：

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

   depth       VAR1    profile     VAR2    VAR3
0    0.5  38.196202  profile_1      NaN     NaN
1    0.6  38.198002  profile_1  0.20440     NaN
2    1.3  38.200001  profile_1      NaN  15.182
3    1.1        NaN  profile_1  0.20442     NaN
4    1.2        NaN  profile_1  0.20446  15.188
5    1.4        NaN  profile_1      NaN  15.182

合並多個 DataFrames Pandas

問題描述

5 個解決方案

解決方案1
16 已采納 2019-04-12 13:45:36

解決方案2
14 2019-04-12 13:47:20

解決方案3
1 2019-04-12 13:52:53

解決方案4
1 2019-04-12 13:59:55

解決方案5
1 2019-04-12 14:23:43

合並多個 DataFrames Pandas

問題描述

5 個解決方案

解決方案1 16 已采納 2019-04-12 13:45:36

解決方案2 14 2019-04-12 13:47:20

解決方案3 1 2019-04-12 13:52:53

解決方案4 1 2019-04-12 13:59:55

解決方案5 1 2019-04-12 14:23:43

解決方案1
16 已采納 2019-04-12 13:45:36

解決方案2
14 2019-04-12 13:47:20

解決方案3
1 2019-04-12 13:52:53

解決方案4
1 2019-04-12 13:59:55

解決方案5
1 2019-04-12 14:23:43