[英]Can I create a histogram/bar graph from a list in python? typeError: expected str, found List
I am using matplotlib, pandas and gensim.我正在使用 matplotlib、pandas 和 gensim。 I am trying to create a histogram based on frequent words by extracting text directly from a website.
我正在尝试通过直接从网站中提取文本来创建基于常用词的直方图。 I am receiving a typeError in this instance:
在这种情况下我收到一个类型错误:
text = ','.join(map(str, description_list))
word_frequency = Counter(" ".join(description_list[0]).split()).most_common(10)
from this part of my code:从我的代码的这一部分:
#start of problems
data = {
"description": [text_corpus]
}
df = pd.DataFrame(data)
description_list = df['description'].values.tolist()
text = ','.join(map(str, description_list))
word_frequency = Counter(" ".join(description_list[0]).split()).most_common(10)
# `most_common` returns a list of (word, count) tuples
words = [word for word, _ in word_frequency]
counts = [counts for _, counts in word_frequency]
plt.bar(words, counts)
plt.title("10 most frequent tokens in description")
plt.ylabel("Frequency")
plt.xlabel("Words")
plt.show()
print(text)
Here is the initial part of my code, which works in extracting textual data from a website:这是我的代码的初始部分,用于从网站提取文本数据:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pprint
from re import X
import string
from tokenize import Token
from collections import Counter
import matplotlib.pyplot as plt
import pandas as pd
url = "https://www.bbc.com/news/world-us-canada-61294585"
html = urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser")
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract()
# get text
text = soup.get_text()
document = text
text_corpus = [text]
# Create a set of frequent words
stoplist = set('for a of the and to in'.split(' '))
# Lowercase each document, split it by white space and filter out stopwords
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in text_corpus]
# Count word frequencies
from collections import defaultdict
frequency = defaultdict(int)
for text in texts:
for token in text:
frequency[token] += 1
# Only keep words that appear more than once
text_corpus = [[token for token in text if frequency[token] > 1] for text in texts]
pprint.pprint(text_corpus)
I am new to Python so please any advice will help.我是 Python 的新手,所以请提供任何建议。 Please let me know If I have something fundamentally wrong with my code, and If i have to restart.
如果我的代码有根本性的错误,请告诉我,如果我必须重新启动。
Or if not, if i could be pointed in the right direction in creating graphs from frequent words would be much appreciated or how to convert this particular list into a string.或者,如果没有,如果我能指出正确的方向来从常用词创建图表,或者如何将这个特定列表转换为字符串,我将不胜感激。
Additional question: Would it be better to search for specific words from a website instead of extracting all text?附加问题:从网站上搜索特定词而不是提取所有文本会更好吗?
Thank you very much.非常感谢你。
As soon as you've got text_corpus
you may proceed as follows:一旦你有了
text_corpus
,你就可以进行如下操作:
#url = "https://stackoverflow.com/questions/72091588/can-i-create-a-histogram-bar-graph-from-a-list-in-python-typeerror-expected-st"
counter = Counter(text_corpus[0]).most_common(10)
words, counts = list(zip(*counter))
plt.bar(words, counts)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.