如何使用python訪問txt文件中字符串的特定部分？

Question

所以我有一個包含很多 HTML 行的大文本文件，它是由網絡爬蟲很好地創建的。 它充滿了看起來像下面的代碼的行。 我想知道，我怎樣才能得到一個新的文本文件，其中只包含“所需文本”而不是整行 html 代碼？

b'<b><a href="example.html" target="_blank">Desired Text 1</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 2</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 3</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 4</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 5</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 6</a></b>'

Answer 1

看看 BeautifulSoup，這些例子有一個關於這個問題的演示：

美湯快速介紹

[編輯] 附上您案例的詳細解決方案：

from bs4 import BeautifulSoup

text = """
b'<b><a href="example.html" target="_blank">Desired Text 1</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 2</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 3</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 4</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 5</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 6</a></b>'
"""

soup = BeautifulSoup(text, 'html.parser')
print soup.getText()

如何使用python訪問txt文件中字符串的特定部分？

問題描述

1 個解決方案

解決方案1
1 已采納 2016-04-23 21:05:04

如何使用python訪問txt文件中字符串的特定部分？

問題描述

1 個解決方案

解決方案1 1 已采納 2016-04-23 21:05:04

解決方案1
1 已采納 2016-04-23 21:05:04