Python：搜索所有包含“ word”的行

Question

我在解析文本時遇到麻煩。

標題：通過Grab lib獲得html頁面，然后通過NLTK lib將其轉換為文本，然后將該文本放入變量中。 此后，我要搜索包含“單詞”的所有行，並打印此行。

例如，我們有下一個文本：

測試1：olololo
test2：打印一些東西
常見問題解答，它是常見問題
我想要搜索test1 ，並將結果打印為： test1: olololo

import logging, nltk
from grab import Grab
from urllib import urlopen

logging.basicConfig(level=logging.DEBUG)
parsing_url = raw_input("Enter URL:")
if parsing_url.startswith('http://') or parsing_url.startswith('https://'):
    parsing_url = parsing_url.replace('http://','').replace('https://','')
print parsing_url
g = Grab()
g.go('http://user:pass@' + parsing_url, log_file='out.html')
url = "out.html"
html = urlopen(url).read()
raw = nltk.clean_html(html)

在bash中，我意識到了這一點：

root@srv:~$ cat 123 | grep "test1"

結果我得到：

test1: olololo

但是在Python中，我不想執行bash命令:)

Answer 1

嘗試這個：

for line in html.split():
  if "test1" in line:
    print line

Answer 2

假設raw是一個字符串列表（即行列表）：

good_lines = [l for l in raw if 'test1' in l]

Answer 3

也許有人會覺得它有用，我這樣解決了這個問題：1.使用NLTK庫將html解碼為文本2.將文本記錄到文件中3.通過bash命令解析文件。 例如：

status,host = commands.getstatusoutput("cat raw.log | sed 's/^[ \t]*//' | grep -A 2 \"On Host\" | sed -n 2p")

另外，我正在嘗試使用python工具解析此文本

Python：搜索所有包含“ word”的行

問題描述

3 個解決方案

解決方案1
1 已采納 2014-01-30 12:03:23

解決方案2
0 2014-01-30 12:04:58

解決方案3
0 2014-02-05 13:03:43

Python：搜索所有包含“ word”的行

問題描述

3 個解決方案

解決方案1 1 已采納 2014-01-30 12:03:23

解決方案2 0 2014-01-30 12:04:58

解決方案3 0 2014-02-05 13:03:43

解決方案1
1 已采納 2014-01-30 12:03:23

解決方案2
0 2014-01-30 12:04:58

解決方案3
0 2014-02-05 13:03:43