python 中所有 dataframe 列的組合

Question

我有三個具有相同索引（國家）的數據框。 我需要找到三個數據框的所有組合，用數據框創建新列。 在這些列中的每一列下，我將乘以這些組合中的值。

Envelope = pd.read_excel("Envelope.xlsx",index_col=0)
Shading = pd.read_excel("Shading.xlsx",index_col=0)
ThermalMass = pd.read_excel("ThermalMass.xlsx",index_col=0)

#Envelope dataframe
Country         Group(A)  Group(B)  Group(C)                       
France          0.4       0.4       0.2
Brussels        0.8       0.1       0.1
Germany_A       0.3       0.6       0.1
Germany_B       0.2       0.5       0.3

#Shading dataframe            
Country     YeSH  NoSH        
France      0.5   0.5
Brussels    0.6   0.4
Germany_A   0.9   0.1
Germany_B   0.4   0.6

#ThermalMass dataframe             
Country     Heavy   Light         
France       0.4    0.6
Brussels     0.5    0.5
Germany_A    0.3    0.7
Germany_B    0.5    0.5`

我嘗試使用 MultiIndex.from_product

all = pd.MultiIndex.from_product([Envelope,Shading,ThermalMass])

但結果僅適用於標題：

print(all)
MultiIndex([('Group(A)', 'YeSH', 'Heavy'),
            ('Group(A)', 'YeSH', 'Light'),
            ('Group(A)', 'NoSH', 'Heavy'),
            ('Group(A)', 'NoSH', 'Light'),
            ('Group(B)', 'YeSH', 'Heavy'),
            ('Group(B)', 'YeSH', 'Light'),
            ('Group(B)', 'NoSH', 'Heavy'),
            ('Group(B)', 'NoSH', 'Light'),
            ('Group(C)', 'YeSH', 'Heavy'),
            ('Group(C)', 'YeSH', 'Light'),
            ('Group(C)', 'NoSH', 'Heavy'),
            ('Group(C)', 'NoSH', 'Light')],
           )

我需要每個國家的值，所以它應該看起來像這樣 (3 x 2x 2) = 12 組合：

           Group(A)_YeSH_Heavy  Group(A)_YeSH_Light  Group(A)_NoSH_Heavy   Group(A)_NoSH_Light
Country                 
France       0.08                0.12                 0.08                    0.12 
Brussels     0.24                0.24                 0.16                    0.16
Germany_A    0.081               0.189                0.009                   0.6
Germany_B    0.04                 0.04                0.06                    0.06

如何創建新列和三個數據框的組合？

Answer 1

您可以執行以下操作：

from itertools import product

# Only if country isn't the index yet
Envelope.set_index('Country', drop=True, inplace=True)
Shading.set_index('Country', drop=True, inplace=True)
ThermalMass.set_index('Country', drop=True, inplace=True)

columns = list(product(Envelope.columns, Shading.columns, ThermalMass.columns))
df = pd.concat([Envelope[col[0]] * Shading[col[1]] * ThermalMass[col[2]]
                for col in columns],
               axis='columns')
df.columns = ['_'.join(c for c in col) for col in columns]

Output：

           Group(A)_YeSH_Heavy  ...  Group(C)_NoSH_Light
Country                         ...                     
France                   0.080  ...                0.060
Brussels                 0.240  ...                0.020
Germany_A                0.081  ...                0.007
Germany_B                0.040  ...                0.090

[4 rows x 12 columns]

Answer 2

改編自這個答案，這里是一個使用 MultiIndex 的矢量化方法。

pidx = np.indices((Envelope.shape[1], Shading.shape[1], ThermalMass.shape[1])).reshape(3, -1)
lcol = pd.MultiIndex.from_product([Envelope, Shading, ThermalMass])
pd.DataFrame(Envelope.values[:, pidx[0]] * Shading.values[:, pidx[1]] * ThermalMass.values[:, pidx[2]],
            columns=lcol, index=Envelope.index)

給出：

          Group(A)                      Group(B)                       \
              YeSH          NoSH            YeSH          NoSH          
             Heavy  Light  Heavy  Light    Heavy  Light  Heavy  Light   
Country                                                                 
France       0.080  0.120  0.080  0.120    0.080  0.120  0.080  0.120   
Brussels     0.240  0.240  0.160  0.160    0.030  0.030  0.020  0.020   
Germany_A    0.081  0.189  0.009  0.021    0.162  0.378  0.018  0.042   
Germany_B    0.040  0.040  0.060  0.060    0.100  0.100  0.150  0.150   

          Group(C)                       
              YeSH          NoSH         
             Heavy  Light  Heavy  Light  
Country                                  
France       0.040  0.060  0.040  0.060  
Brussels     0.030  0.030  0.020  0.020  
Germany_A    0.027  0.063  0.003  0.007  
Germany_B    0.060  0.060  0.090  0.090

python 中所有 dataframe 列的組合

問題描述

2 個解決方案

解決方案1
2 已采納 2020-12-16 12:43:17

解決方案2
1 2020-12-16 12:46:17

python 中所有 dataframe 列的組合

問題描述

2 個解決方案

解決方案1 2 已采納 2020-12-16 12:43:17

解決方案2 1 2020-12-16 12:46:17

解決方案1
2 已采納 2020-12-16 12:43:17

解決方案2
1 2020-12-16 12:46:17