[英]Wordcloud only illustrates letters no words
我目前正在分析文本数据,并从语料库中提取名词。
是的,我是一个新手,我来这里是为了通过我的错误来学习和改进。
当我根据提取的名词列创建词云时,词云只显示字母和符号,而没有显示单个单词。
我主要关注的不是 wordcloud ,但由于我正在进一步分析文本、主题建模并旨在开发预测模型,因此我想确保该列没有进一步分析的问题。
from textblob import TextBlob
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
height=500,
max_words=50,
max_font_size=100,
relative_scaling=0.5,
colormap='Blues',
normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
数据框中带有名词的列
0 ['lot']
1 ['weapon', 'gun', 'instance']
2 ['drive', 'drive', 'car']
3 ['felt', 'guy', 'stage']
4 ['price', 'launch', 'ryse', 'son', 'ip', 'cryt...
5 ['drivatar', 'crash', 'guy', 'track', 'use', '...
6 ['spark', 'thing']
7 ['stream', 'player', 'linux', 'start', 'stream...
8 ['kill', 'game', 'absolute', 'shit']
9 ['breed', 'stealth', 'horse', 'duck']
10 ['beach', 'duty']
11 []
12 ['europe', 'guess']
13 ['power', 'cloud', 'god']
14 ['gameplay', 'footage', 'zoom']
15 []
16 ['stream', 'play', 'game', 'week', 'gdex', 'co...
17 ['edit']
19 ['halo', 'clip', 'lot', 'journey']
21 ['thing', 'master', 'chief', 'shawl', 'help', ...
22 ['respect', 'respawn', 'trailer', 'gameplay', ...
Name: nouns, Length: 7523, dtype: object
你的代码很好。 您未在此处显示的预处理管道中一定存在错误。
有关基于您的代码的完整工作示例,请参见下文:
from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud
texts = ["This is some text about thing", "This is another text about gun", "This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
height=500,
max_words=50,
max_font_size=100,
relative_scaling=0.5,
colormap='Blues',
normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, cmap="gray_r")
plt.axis("off")
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.