简体   繁体   English

在数据框上使用数据透视表时出现问题

[英]Problems using pivot_table on my dataframe

I am trying to pivot my dataframe so I can make a document matrix but I am running into some errors with I try to pivot my dataframe. 我试图旋转数据框以便创建文档矩阵,但是尝试旋转数据框时遇到一些错误。 Here is my original dataframe before I try to mess with it. 这是我尝试弄乱之前的原始数据框。

tidy_filter1 = pd.DataFrame(df_tweetText["text"].str.split(expand = True).stack().reset_index())
tidy_filter = pd.DataFrame(tidy_filter1,index = tidy_format1["id"])
tidy_filter = tidy_filter1.rename(index = tidy_filter["id"], columns = {"level_1": "num",0:"word"})
tidy_filter1["level_1"] = tidy_filter1.groupby("id").cumcount()
tidy_filter = tidy_filter.drop(columns = ["id"])
tidy_filter = tidy_filter.rename(index = tidy_format1["id"])

id                    num    word
1104159474368024599    0    repmiketurner
1104159474368024599    1    time
1104159474368024599    2    michael
1104159474368024599    3    cohen
1104159474368024599    4    told
1104159474368024599    5    truth
1104159474368024599    6    pled
1104159474368024599    7    guilty
1104159474368024599    8    also
1104159474368024599    9    said
1104159474368024599    10    collusion

Now when I try to run this code below is where it breaks. 现在,当我尝试在下面运行此代码时,它就会中断。

df_freq = tidy_filter.pivot_table(values='word', index=tidy_filter.index, columns='word', aggfunc=pd.Series.count)

The error gives me KeyError: 'word' which I don't understand. 该错误给了我我不理解的KeyError:'word' I tried to replace the values/columns with tidy_filter['word] but that did not work. 我试图用tidy_filter ['word]替换值/列,但这没有用。

**Edit: I am looking for this output **编辑:我正在寻找此输出

id                   repmiketurner michael cohen told truth pled guilty also said collusion
1104159474368024599         1         1       1     1   1     1     1      1   1       1
1104155456019357703        0          0       0     1   1     0     0      1   0       0

**Edit2: I so when I type in tidy_filter['word'] it gives me a different KeyError: 'repmiketurner' ** Edit2:我这样在输入tidy_filter ['word']时会得到一个不同的KeyError:'repmiketurner'

我认为您正在寻找pd.crosstab

pd.crosstab(df.id,df.word)

I think your pivot_table command had to be changed - you had the arguments for columns and values interchanged. 我认为您的pivot_table命令必须更改-您已互换了columnsvalues的参数。 You just needed to use columns='word' and values='num' . 您只需要使用columns='word'values='num'

This is what worked for me 这就是对我有用的

df_freq = tidy_filter.pivot_table(columns='word',
                                    index=tidy_filter.index,
                                    values='num',
                                    aggfunc=pd.Series.count)

# Put pivot table columns in order of unique values of the 'word' column
word_unique = tidy_filter['word'].unique().tolist()
df_freq = df_freq[word_unique]

print(df_freq)
word                 repmiketurner  time  michael  cohen  told  truth  pled  guilty  also  said  collusion
id                                                                                                        
1104159474368024599              1     1        1      1     1      1     1       1     1     1          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM