簡體   English   中英

如何在多個網頁或網址中查找特定單詞並對其進行計數,使用 Python

[英]How to find a specific word in multiple webpages or urls and count it, using Python

下面是我的代碼。 請檢查並糾正我。

import requests

from bs4 import BeautifulSoup

url = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

the_word = input()

r = requests.get(url, allow_redirects=False)

soup = BeautifulSoup(r.content, 'lxml')

words = soup.find(text=lambda text: text and the_word in text)

print(words)

count = len(words)

print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))

如何更改我的代碼以解析多個 URL 並計算特定單詞出現的次數?

import requests
from bs4 import BeautifulSoup

url_list = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

#the_word = input()
the_word = 'Python'

total_words = []
for url in url_list:
    r = requests.get(url, allow_redirects=False)
    soup = BeautifulSoup(r.content.lower(), 'lxml')
    words = soup.find_all(text=lambda text: text and the_word.lower() in text)
    count = len(words)
    words_list = [ ele.strip() for ele in words ]
    for word in words:
        total_words.append(word.strip())

    print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
    print(words_list)


#print(total_words)
total_count = len(total_words)

輸出:

Url: https://www.tensorflow.org/
contains 0 of word: Python
[]

Url: https://www.tomordonez.com/
contains 8 of word: Python
['web scraping with python', 'this is a tutorial on web scraping with python. learn to scrape websites with python and beautifulsoup.', 'python unit testing tutorial', 'this is a tutorial about unit testing in python.', 'pip install ssl module in python is not available', 'troubleshooting ssl module in python is not available', 'python context manager', 'a short tutorial about python context manager: "with" statement.']

您可以使用re模塊來查找特定文本。

import requests
import re
from bs4 import BeautifulSoup

urls = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]

the_word ='Tableau'

for url in urls:
 print(url)
 r = requests.get(url, allow_redirects=False)
 soup = BeautifulSoup(r.text, 'html.parser')
 words = soup.find_all(text=re.compile(the_word))
 print(len(words))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM