简体   繁体   中英

How to create a Word Cloud from a large html file using Python?

I have a very large but single html file from which I'd like to build a word cloud. I have BeautifulSoup, Wordcloud, numpy and matplotlib installed, but many of the guides out there deal with urls. I just need to parse a local file and work from there.

Any advice on how to get started?

Following the documentation you can simply pass beautifulsoup an open file:

from bs4 import BeautifulSoup

with open("index.html") as fp:
    soup = BeautifulSoup(fp)

soup = BeautifulSoup("<html>data</html>")

Then after that you should be able to follow any tutorial.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM