[英]How to count the number of times a word appears for specific fields using python?
[英]How to find a specific word in multiple webpages or urls and count it, using Python
下面是我的代碼。 請檢查並糾正我。
import requests
from bs4 import BeautifulSoup
url = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
the_word = input()
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content, 'lxml')
words = soup.find(text=lambda text: text and the_word in text)
print(words)
count = len(words)
print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
如何更改我的代碼以解析多個 URL 並計算特定單詞出現的次數?
import requests
from bs4 import BeautifulSoup
url_list = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
#the_word = input()
the_word = 'Python'
total_words = []
for url in url_list:
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content.lower(), 'lxml')
words = soup.find_all(text=lambda text: text and the_word.lower() in text)
count = len(words)
words_list = [ ele.strip() for ele in words ]
for word in words:
total_words.append(word.strip())
print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
print(words_list)
#print(total_words)
total_count = len(total_words)
輸出:
Url: https://www.tensorflow.org/
contains 0 of word: Python
[]
Url: https://www.tomordonez.com/
contains 8 of word: Python
['web scraping with python', 'this is a tutorial on web scraping with python. learn to scrape websites with python and beautifulsoup.', 'python unit testing tutorial', 'this is a tutorial about unit testing in python.', 'pip install ssl module in python is not available', 'troubleshooting ssl module in python is not available', 'python context manager', 'a short tutorial about python context manager: "with" statement.']
您可以使用re
模塊來查找特定文本。
import requests
import re
from bs4 import BeautifulSoup
urls = ["https://www.tensorflow.org/","https://www.tomordonez.com/"]
the_word ='Tableau'
for url in urls:
print(url)
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.text, 'html.parser')
words = soup.find_all(text=re.compile(the_word))
print(len(words))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.