Python Pandas pivot_table 在透視后缺少列

Question

我有以下數據框。 數據框是通過讀取 csv 文件構建的。 它是一個大數據集，但出於這個問題的目的，我使用了數據集中的 15 行作為示例。

   user_id   contrib_count   total_min_length     group_space     expert_level
0     23720        108           1112696               0             l-2
1     23720         13            442059               1             l-2
2     23720         12             32180               2             l-2
3     23720          2             20177               3             l-2
4     23720          1              1608              10             l-2
5   1265184         71            260186               0             l-G
6   1265184         10              3466               2             l-G
7   1265184          1             12081               4             l-G
8    513380        112           1049311               0             l-4
9    513380          1                97               1             l-4
10   513380        113            361980               2             l-4
11   513380         19           1198323               3             l-4
12   513380          2             88301               4             l-4
13    20251        705          17372707               0             l-G
14    20251        103           2327178               1             l-G

預期結果旋轉后我想要的是以下數據框：

group_space        0      1     2     3     4     5    6   7    8   9    10  expert_level
user_id
20251             705    103    68    24    18     2    6 NaN  NaN   5   22     l-G                                                                  
23720             108     13    12     2   NaN   NaN  NaN NaN  NaN NaN    1     l-2

我這樣做的原因是，一旦我這樣做，我就可以將其用於預測任務，其中expert_level作為標簽數據。

到目前為止，我已經完成了構建上述矩陣的操作，但我無法獲得如樞軸后所示的expert_level列。

這就是我所做的：

class GroupAnalysis():

    def __init__(self):
        self.df = None
        self.filelocation = '~/somelocation/x.csv'

    def pivot_dataframe(self):

        raw_df = pd.read_csv(self.filelocation)
        self.df = raw_df[(raw_df['group_space'] < 11)]
        self.df.set_index(['user_id', 'group_space'], inplace=True)
        self.df = self.df['contrib_count'].unstack()

通過這樣做，我得到：

group_space        0      1     2     3     4     5    6   7    8   9    10
user_id
20251             705    103    68    24    18     2    6 NaN  NaN   5   22                                                                
23720             108     13    12     2   NaN   NaN  NaN NaN  NaN NaN    1

正如您所看到的，我在最后遺漏了expert_level列。 所以問題是如何使用我在“預期結果”中顯示的專家級別獲得數據框？

Answer 1

當您expert_level total_min_length ，您只是在expert_level total_min_length一系列contrib_count - expert_level和總total_min_length在那時已經消失了。

您可以使用.pivot()而不是設置索引和.pivot()

pivoted = df.pivot('user_id', 'group_space', 'contrib_count')

然后，創建一個以user_id為索引、 expert_level為列的框架，去除重復項：

lookup = df.drop_duplicates('user_id')[['user_id', 'expert_level']]
lookup.set_index(['user_id'], inplace=True)

然后加入您的pivot並lookup

result = pivoted.join(lookup)

編輯：如果你還想包括total_min_length ，你可以做第二個樞軸：

pivoted2 = df.pivot('user_id', 'group_space', 'total_min_length')

並加入所有三個而不是兩個：

result = pivoted.join(lookup).join(pivoted2, lsuffix="_contrib_count", rsuffix="_total_min_length")

請注意，需要lsuffix和rsuffix來消除列的歧義，因為兩個數據透視都有來自示例數據的0, 1, 2, 3, 4和10列。

Python Pandas pivot_table 在透視后缺少列

問題描述

1 個解決方案

解決方案1
2 已采納 2014-10-17 22:30:59

Python Pandas pivot_table 在透視后缺少列

問題描述

1 個解決方案

解決方案1 2 已采納 2014-10-17 22:30:59

解決方案1
2 已采納 2014-10-17 22:30:59