多個網站上的python 2搜索短語，取自列表文件

Question

因此，我在名為“輸出”的文件中具有以下鏈接列表：

https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]

如果您從列表中打開第一個鏈接，然后在Firefox中按“ ctrl + f”，則可以找到短語“ Katastrofa”。

我只想擁有一個腳本，該腳本可以找到一個短語（“ Katastrofa”僅是示例，我想使用argv參數，但這在這里並不重要），打印一些成功消息並繼續進行下去...

我被卡住了，不知道該怎么做。 我測試的腳本沒有“看到”單詞（“ Katastrofa”），這肯定是在第一頁上。

請幫忙：）

到目前為止，這是我所做的：

f = open('output', 'r')
f2 = f.readlines()
for i in f2:
     r=requests.get(i)
     first_page = r.text
     soup = BeautifulSoup(first_page, 'html.parser')
     page_soup = soup
     fraza = "Katastrofa"
     boxes = page_soup.body.find_all(fraza)
     print(i)
     print(boxes)

輸出：

https://web.archive.org/web/20180101003616/http://onet.pl

[]
https://web.archive.org/web/20180102000139/http://onet.pl

[]
https://web.archive.org/web/20180103002217/http://onet.pl

Answer 1

如果要搜索html string包含文本

for i in f2:
    r=requests.get(i)
    fraza = "Katastrofa"
    if re.match(fraza, r.text, re.I) # ignore case
        print(i)

如果要搜索html element包含文本

for i in f2:
    r=requests.get(i)
    soup = BeautifulSoup(r.text, 'html.parser')
    fraza = "Katastrofa"
    boxes = soup.find_all(True, text=re.compile(fraza, re.I))
    if boxes:
        print(i)
        print(boxes)

結果是最后一個子元素的列表：

https://web.archive.org/web/20180101003616/http://onet.pl
[<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary  </span>, 
<span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne  </span>]

多個網站上的python 2搜索短語，取自列表文件

問題描述

1 個解決方案

解決方案1
0 2018-11-23 01:17:52

多個網站上的python 2搜索短語，取自列表文件

問題描述

1 個解決方案

解決方案1 0 2018-11-23 01:17:52

解決方案1
0 2018-11-23 01:17:52