如何在多个网页或网址中查找特定单词并对其进行计数，使用 Python

Question

Below is my code.下面是我的代码。 Kindly check & correct me.请检查并纠正我。

import requests

from bs4 import BeautifulSoup

url = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

the_word = input()

r = requests.get(url, allow_redirects=False)

soup = BeautifulSoup(r.content, 'lxml')

words = soup.find(text=lambda text: text and the_word in text)

print(words)

count = len(words)

print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))

How can I change my code to parse multiple URLs and count how many times a specific word occurs?如何更改我的代码以解析多个 URL 并计算特定单词出现的次数？

Answer 1

import requests
from bs4 import BeautifulSoup

url_list = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

#the_word = input()
the_word = 'Python'

total_words = []
for url in url_list:
    r = requests.get(url, allow_redirects=False)
    soup = BeautifulSoup(r.content.lower(), 'lxml')
    words = soup.find_all(text=lambda text: text and the_word.lower() in text)
    count = len(words)
    words_list = [ ele.strip() for ele in words ]
    for word in words:
        total_words.append(word.strip())

    print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
    print(words_list)


#print(total_words)
total_count = len(total_words)

Output:输出：

Url: https://www.tensorflow.org/
contains 0 of word: Python
[]

Url: https://www.tomordonez.com/
contains 8 of word: Python
['web scraping with python', 'this is a tutorial on web scraping with python. learn to scrape websites with python and beautifulsoup.', 'python unit testing tutorial', 'this is a tutorial about unit testing in python.', 'pip install ssl module in python is not available', 'troubleshooting ssl module in python is not available', 'python context manager', 'a short tutorial about python context manager: "with" statement.']

Answer 2

You can use re module to find particular text.您可以使用re模块来查找特定文本。

import requests
import re
from bs4 import BeautifulSoup

urls = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

the_word ='Tableau'

for url in urls:
 print(url)
 r = requests.get(url, allow_redirects=False)
 soup = BeautifulSoup(r.text, 'html.parser')
 words = soup.find_all(text=re.compile(the_word))
 print(len(words))

如何在多个网页或网址中查找特定单词并对其进行计数，使用 Python

问题描述

2 个解决方案

解决方案1
1 2019-03-15 09:31:56

解决方案2
0 2019-03-15 10:11:17

如何在多个网页或网址中查找特定单词并对其进行计数，使用 Python

问题描述

2 个解决方案

解决方案1 1 2019-03-15 09:31:56

解决方案2 0 2019-03-15 10:11:17

解决方案1
1 2019-03-15 09:31:56

解决方案2
0 2019-03-15 10:11:17