简体   繁体   English

Python Pandas pivot_table 在透视后缺少列

[英]Python Pandas pivot_table missing column after pivot

I have following data frame.我有以下数据框。 The data frame is constructed by reading a csv file.数据框是通过读取 csv 文件构建的。 Its a large data set but for this question purpose I have used 15 rows from the data set as an example.它是一个大数据集,但出于这个问题的目的,我使用了数据集中的 15 行作为示例。

   user_id   contrib_count   total_min_length     group_space     expert_level
0     23720        108           1112696               0             l-2
1     23720         13            442059               1             l-2
2     23720         12             32180               2             l-2
3     23720          2             20177               3             l-2
4     23720          1              1608              10             l-2
5   1265184         71            260186               0             l-G
6   1265184         10              3466               2             l-G
7   1265184          1             12081               4             l-G
8    513380        112           1049311               0             l-4
9    513380          1                97               1             l-4
10   513380        113            361980               2             l-4
11   513380         19           1198323               3             l-4
12   513380          2             88301               4             l-4
13    20251        705          17372707               0             l-G
14    20251        103           2327178               1             l-G

Expected Results After pivot what I want is following data frame:预期结果旋转后我想要的是以下数据框:

group_space        0      1     2     3     4     5    6   7    8   9    10  expert_level
user_id
20251             705    103    68    24    18     2    6 NaN  NaN   5   22     l-G                                                                  
23720             108     13    12     2   NaN   NaN  NaN NaN  NaN NaN    1     l-2

Reason I am doing this is once I do this I can use this for a prediction task where expert_level as label data.我这样做的原因是,一旦我这样做,我就可以将其用于预测任务,其中expert_level作为标签数据。

So far I have done following to to build the above matrix but I am unable to get the expert_level column as shown after the pivot.到目前为止,我已经完成了构建上述矩阵的操作,但我无法获得如枢轴后所示的expert_level列。

This is what I have done:这就是我所做的:

class GroupAnalysis():

    def __init__(self):
        self.df = None
        self.filelocation = '~/somelocation/x.csv'

    def pivot_dataframe(self):

        raw_df = pd.read_csv(self.filelocation)
        self.df = raw_df[(raw_df['group_space'] < 11)]
        self.df.set_index(['user_id', 'group_space'], inplace=True)
        self.df = self.df['contrib_count'].unstack()

By doing this I get:通过这样做,我得到:

group_space        0      1     2     3     4     5    6   7    8   9    10
user_id
20251             705    103    68    24    18     2    6 NaN  NaN   5   22                                                                
23720             108     13    12     2   NaN   NaN  NaN NaN  NaN NaN    1 

As you can see I am missing the expert_level column at the end.正如您所看到的,我在最后遗漏expert_level列。 So the question is How can I get above data frame with the expert_level as I shown in my "Expected Results"?所以问题是如何使用我在“预期结果”中显示的专家级别获得数据框?

When you unstacked, you were only unstacking a series contrib_count - expert_level and total_min_length were already gone at that point.当您expert_level total_min_length ,您只是在expert_level total_min_length一系列contrib_count - expert_level和总total_min_length在那时已经消失了。

Instead of setting index and unstacking, you can just use .pivot()您可以使用.pivot()而不是设置索引和.pivot()

pivoted = df.pivot('user_id', 'group_space', 'contrib_count')

Then, create a frame with user_id as the index and expert_level as a column, getting rid of duplicates:然后,创建一个以user_id为索引、 expert_level为列的框架,去除重复项:

lookup = df.drop_duplicates('user_id')[['user_id', 'expert_level']]
lookup.set_index(['user_id'], inplace=True)

Then join your pivot and lookup然后加入您的pivotlookup

result = pivoted.join(lookup)

EDIT: If you also want to include total_min_length , you can do a second pivot:编辑:如果你还想包括total_min_length ,你可以做第二个枢轴:

pivoted2 = df.pivot('user_id', 'group_space', 'total_min_length')

and join all three instead of two:并加入所有三个而不是两个:

result = pivoted.join(lookup).join(pivoted2, lsuffix="_contrib_count", rsuffix="_total_min_length")

Note that lsuffix and rsuffix are required to disambiguate columns, as both pivots have 0, 1, 2, 3, 4 , and 10 columns from your example data.请注意,需要lsuffixrsuffix来消除列的歧义,因为两个数据透视都有来自示例数据的0, 1, 2, 3, 410列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM