How to get a specific word from html page using beautiful soup in python

Question

I have to extract specific words from a HTML page and count the number of times the word has been repeated. How do I do this using beautiful soup in python? How do I pass the url in the soup and then count the words ?

This is my code till now. I have no idea what to do next.

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://pythonprogramming.net/parsememcparseface/').read()

soup = bs.BeautifulSoup(source,'lxml')

for paragraph in soup.find_all('p'):
    print(paragraph.string)
    print(str(paragraph.text))

Answer 1

You could get all the text in the page using

soup.get_text()

After setting that to a variable you could then use the .count() method to find the amount that a certain string appears in the HTML page. eg

text = soup.get_text()
print (text.count('word'))

To make sure you aren't getting words inside words you could split everything with a space and then look for them in each index of the list. For example 'house' is inside 'houses' would be fixed by this.

How to get a specific word from html page using beautiful soup in python

Question

1 answers

solution1
0 2017-11-05 11:13:44

How to get a specific word from html page using beautiful soup in python

Question

1 answers

solution1 0 2017-11-05 11:13:44

solution1
0 2017-11-05 11:13:44