簡體   English   中英

如何使用 Pandas 在 DataFrame 中查找單詞列表的頻率

[英]How to find the frequency of list of words in a DataFrame using Pandas

我的df看起來像這樣:

category       text_list
--------       ---------
soccer         [soccer, game, is, good, soccer, game]
basketball     [game, basketball, game]
volleyball     [sport ,volleyball, sport] 

我想做的是按category groupby ,然后按frequency列出words

category       text_list          frequency
--------       ---------          ---------
soccer         soccer             2
               game               2 
               is                 1
               good               1
basketball     game               2
               basketball         1  
volleyball     sport              2
               volleyball         1

我做了什么?

  • 我能夠找到每行的frequency ,但我無法按照我在 DataFrame 中想要的方式DataFrame

有人可以幫我嗎? 如果可能的話,使用NLTK

嘗試explode然后groupby

(df.explode('text_list')
   .groupby(['category','text_list']).size()
   .to_frame(name='frequency')
)

Output:

                       frequency
category   text_list            
basketball basketball          1
           game                2
soccer     game                2
           good                1
           is                  1
           soccer              2
volleyball sport               2
           volleyball          1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM