在python中使用Beautiful Soup解析html

Question

我有以下html：

<html lang="en-US" xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<body>
<title>CATe - hj1612</title>
</td></tr></table>
</td></tr></table></td><td><img src="icons/arrowredright.gif"/></td><td align="center">
<input name="keyt" type="hidden" value="a3dvl"/>
<input type="submit" value="View"/><br/>or<br/>
<input type="reset" value="Reset"/>
</td>
</tr>
</body>
</html>

而我正試圖獲得keyt的價值。 因為它是html我使用BeautifulSoup 。

soup = BeautifulSoup(html)

我知道你可以使用soup.find與id喜歡soup.find(id="randomid")

但是soup.find(name="keyt")將不起作用，因為它不是一個body標簽...因此我可以使用if substring in string: method中的普通if substring in string:

for line in soup.find_all('input'):
    if "keyt" in line:
        print line

但這種方法似乎不起作用，我是python的新手，所以會感謝任何幫助/指向正確的方向

Answer 1

from bs4 import BeautifulSoup

html = """
<html lang="en-US" xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>CATe - hj1612</title>
</td></tr></table>
</td></tr></table></td><td><img src="icons/arrowredright.gif"/></td><td align="center">
<input name="keyt" type="hidden" value="a3dvl"/>
<input type="submit" value="View"/><br/>or<br/>
<input type="reset" value="Reset"/>
</td>
</tr>
</html>
"""

soup = BeautifulSoup(html)

print soup.find(name="input", attrs={'name': 'keyt'})

輸出：

<input name="keyt" type="hidden" value="a3dvl"/>

如果要查找多個匹配項，可以使用find_all函數而不是find 。 至於如何使用這兩個函數， name是你要查找的標記的名稱，而attrs dict是你真正用來查找具有特定屬性的東西，在你的情況下是name屬性。

Answer 2

你有一些奇怪的HTML。 HEAD標簽未關閉，td，表未打開。 我甚至無法想象湯如何解析它。

在python中使用Beautiful Soup解析html

問題描述

2 個解決方案

解決方案1
3 已采納 2013-10-19 13:55:33

解決方案2
1 2013-10-19 13:52:33

在python中使用Beautiful Soup解析html

問題描述

2 個解決方案

解決方案1 3 已采納 2013-10-19 13:55:33

解決方案2 1 2013-10-19 13:52:33

解決方案1
3 已采納 2013-10-19 13:55:33

解決方案2
1 2013-10-19 13:52:33