[英]python 2 search phrase on multiple web sites, taken from a list file
So I have following list of links in file called "output": 因此,我在名为“输出”的文件中具有以下链接列表:
https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]
If you open first link from the list and press "ctrl + f" in firefox, you can find phrase "Katastrofa". 如果您从列表中打开第一个链接,然后在Firefox中按“ ctrl + f”,则可以找到短语“ Katastrofa”。
All I want is to have a script, which can find a phrase ("Katastrofa" is only example, I want to use argv argument, but that's not important here), print some success message and proceed further... 我只想拥有一个脚本,该脚本可以找到一个短语(“ Katastrofa”仅是示例,我想使用argv参数,但这在这里并不重要),打印一些成功消息并继续进行下去...
I got stuck and can't figure out how to do it. 我被卡住了,不知道该怎么做。 The script I got for testing does not "see" the word ("Katastrofa"), which definitely is on the first page...
我测试的脚本没有“看到”单词(“ Katastrofa”),这肯定是在第一页上。
Please help :) 请帮忙 :)
Here is what I've done so far: 到目前为止,这是我所做的:
f = open('output', 'r')
f2 = f.readlines()
for i in f2:
r=requests.get(i)
first_page = r.text
soup = BeautifulSoup(first_page, 'html.parser')
page_soup = soup
fraza = "Katastrofa"
boxes = page_soup.body.find_all(fraza)
print(i)
print(boxes)
Output: 输出:
https://web.archive.org/web/20180101003616/http://onet.pl
[]
https://web.archive.org/web/20180102000139/http://onet.pl
[]
https://web.archive.org/web/20180103002217/http://onet.pl
if you want to search if in html string
contain text 如果要搜索
html string
包含文本
for i in f2:
r=requests.get(i)
fraza = "Katastrofa"
if re.match(fraza, r.text, re.I) # ignore case
print(i)
if you want to search html element
contain text 如果要搜索
html element
包含文本
for i in f2:
r=requests.get(i)
soup = BeautifulSoup(r.text, 'html.parser')
fraza = "Katastrofa"
boxes = soup.find_all(True, text=re.compile(fraza, re.I))
if boxes:
print(i)
print(boxes)
Results is list of last child element: 结果是最后一个子元素的列表:
https://web.archive.org/web/20180101003616/http://onet.pl
[<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
<span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.