繁体   English   中英

Wordcloud 只说明字母没有单词

[英]Wordcloud only illustrates letters no words

我目前正在分析文本数据,并从语料库中提取名词。

是的,我是一个新手,我来这里是为了通过我的错误来学习和改进。

当我根据提取的名词列创建词云时,词云只显示字母和符号,而没有显示单个单词。

我主要关注的不是 wordcloud ,但由于我正在进一步分析文本、主题建模并旨在开发预测模型,因此我想确保该列没有进一步分析的问题。

from textblob import TextBlob
def get_nouns(text):
   blob = TextBlob(text)
   return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)

# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
                  height=500,
                  max_words=50,
                  max_font_size=100,
                  relative_scaling=0.5,
                  colormap='Blues',
                  normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

当前词云输出

数据框中带有名词的列

0                                                 ['lot']
1                           ['weapon', 'gun', 'instance']
2                               ['drive', 'drive', 'car']
3                                ['felt', 'guy', 'stage']
4       ['price', 'launch', 'ryse', 'son', 'ip', 'cryt...
5       ['drivatar', 'crash', 'guy', 'track', 'use', '...
6                                      ['spark', 'thing']
7       ['stream', 'player', 'linux', 'start', 'stream...
8                    ['kill', 'game', 'absolute', 'shit']
9                   ['breed', 'stealth', 'horse', 'duck']
10                                      ['beach', 'duty']
11                                                     []
12                                    ['europe', 'guess']
13                              ['power', 'cloud', 'god']
14                        ['gameplay', 'footage', 'zoom']
15                                                     []
16      ['stream', 'play', 'game', 'week', 'gdex', 'co...
17                                               ['edit']
19                     ['halo', 'clip', 'lot', 'journey']
21      ['thing', 'master', 'chief', 'shawl', 'help', ...
22      ['respect', 'respawn', 'trailer', 'gameplay', ...

Name: nouns, Length: 7523, dtype: object

你的代码很好。 您未在此处显示的预处理管道中一定存在错误。

有关基于您的代码的完整工作示例,请参见下文:

from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud

texts = ["This is some text about thing", "This is another text about gun", "This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})

def get_nouns(text):
    blob = TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)


# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
                  height=500,
                  max_words=50,
                  max_font_size=100,
                  relative_scaling=0.5,
                  colormap='Blues',
                  normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, cmap="gray_r")
plt.axis("off")
plt.show()

在此处输入图片说明

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM