Pandas dataframe 將行值重塑為新列（矩陣類型格式）

Question

我是 pandas 的新手，正在尋找一些關於如何重塑我的 dataframe 的建議：

目前，我有一個像這樣的 dataframe。

panelist_id	類型	type_count	refer_sm_count	refer_se_count	refer_non_n_count
1	生命值	2	2	1	1
1	鉛	1	0	1	0
1	TN	3	0	3	0
2	生命值	1	1	0	0
2	鉛	2	1	1	0	0

理想情況下，我希望我的 dataframe 看起來像這樣：

panelist_id	type_HP_count	type_PB_count	type_TN_count	參考_sm_count_HP	參考_se_count_HP	refer_non_n_count_HP	refer_sm_count_PB	參考_se_count_PB	refer_non_n_count_PB	refer_sm_count_TN	參考se_count_TN	refer_non_n_count_TN
1	2	1	3	2	1	0	0	1	0	0	0	0
2	1	2	0	1	0	0	1	1	0	0	0	0

基本上，我需要將“類型”列中的不同行值轉換為新列，顯示每種類型的計數。 原始df中標題為“refer”的接下來的三列需要考慮每種不同的“類型”。 例如，refers_sm_count_[來自類型 X（例如，HP）]。 任何幫助將非常感激。 謝謝

Answer 1

通過pivot_table()和rename_axis()方法嘗試：

out=(df.pivot_table(index='panelist_id',columns='type',fill_value=0)
      .rename_axis(columns=[None,None],index=None))

最后使用map()方法和.columns屬性：

out.columns=out.columns.map('_'.join)

現在如果你打印out你會得到你想要的 output

Answer 2

通過pyjanitor的pivot_wider選項：

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

new_df ：

panelist_id  type_count_HP  type_count_PB  type_count_TN  refer_sm_count_HP  refer_sm_count_PB  refer_sm_count_TN  refer_se_count_HP  refer_se_count_PB  refer_se_count_TN  refer_non_n_count_HP  refer_non_n_count_PB  refer_non_n_count_TN
          1              2              1              3                  2                  0                  0                  1                  1                  3                     1                     0                     0
          2              1              2              0                  1                  1                  0                  0                  1                  0                     0                     0                     0

完整的工作示例：

import janitor
import pandas as pd

df = pd.DataFrame({
    'panelist_id': [1, 1, 1, 2, 2],
    'type': ['HP', 'PB', 'TN', 'HP', 'PB'],
    'type_count': [2, 1, 3, 1, 2],
    'refer_sm_count': [2, 0, 0, 1, 1],
    'refer_se_count': [1, 1, 3, 0, 1],
    'refer_non_n_count': [1, 0, 0, 0, 0]
})

new_df = df.pivot_wider(index='panelist_id',
                        names_from='type',
                        names_from_position='last',
                        fill_value=0)

print(new_df.to_string(index=False))

Answer 3

只需添加一個選項：

df = df.set_index(['panelist_id', 'type']).unstack(-1, ,fill_value=0)
df.columns = df.columns.map('_'.join)

Answer 4

使用 pivot_table 創建多索引

df_p = df.pivot_table(index='panelist_id', columns='type', aggfunc=sum)

            refer_non_n_count           refer_se_count            \
type                       HP   PB   TN             HP   PB   TN   
panelist_id                                                        
1                         1.0  0.0  0.0            1.0  1.0  3.0   
2                         0.0  0.0  NaN            0.0  1.0  NaN   

            refer_sm_count           type_count            
type                    HP   PB   TN         HP   PB   TN  
panelist_id                                                
1                      2.0  0.0  0.0        2.0  1.0  3.0  
2                      1.0  1.0  NaN        1.0  2.0  NaN

如果您確實想展平列，那么

df_p.columns = ['_'.join(col) for col in df_p.columns.values]

Answer 5

首先，導入庫：

import numpy as np
import pandas as pd

然后，讀取您的數據：

data = pd.read_excel('base.xlsx')

使用 pivot_table 重塑數據：

data_reshaped = pd.pivot_table(data, values=['type_count', 'refer_sm_count', 'refer_se_count', 'refer_non_n_count'],
                               index=['panelist_id'], columns=['type'], aggfunc=np.sum)

但是，你的索引不會很好。 所以，然后重置：

columns = [data_reshaped.columns[i][0] + '_' + data_reshaped.columns[i][1]
           for i in range(len(data_reshaped.columns))] # to create new columns names

data_reshaped.columns = columns # to assign new columns names to dataframe
data_reshaped.reset_index(inplace=True) # to reset index
data_reshaped.fillna(0, inplace=True) # to substitute nan to 0

然后，您的數據會很好

Pandas dataframe 將行值重塑為新列（矩陣類型格式）

問題描述

5 個解決方案

解決方案1
3 已采納 2021-06-01 16:59:42

解決方案2
3 2021-06-01 17:11:17

解決方案3
3 2021-06-01 17:44:59

解決方案4
2 2021-06-01 16:58:12

解決方案5
2 2021-06-01 17:30:57

Pandas dataframe 將行值重塑為新列（矩陣類型格式）

問題描述

5 個解決方案

解決方案1 3 已采納 2021-06-01 16:59:42

解決方案2 3 2021-06-01 17:11:17

解決方案3 3 2021-06-01 17:44:59

解決方案4 2 2021-06-01 16:58:12

解決方案5 2 2021-06-01 17:30:57

解決方案1
3 已采納 2021-06-01 16:59:42

解決方案2
3 2021-06-01 17:11:17

解決方案3
3 2021-06-01 17:44:59

解決方案4
2 2021-06-01 16:58:12

解決方案5
2 2021-06-01 17:30:57