I have a very large but single html file from which I'd like to build a word cloud. I have BeautifulSoup, Wordcloud, numpy and matplotlib installed, but many of the guides out there deal with urls. I just need to parse a local file and work from there.
Any advice on how to get started?
Following the documentation you can simply pass beautifulsoup an open file:
from bs4 import BeautifulSoup
with open("index.html") as fp:
soup = BeautifulSoup(fp)
soup = BeautifulSoup("<html>data</html>")
Then after that you should be able to follow any tutorial.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.