用python計算html文件上的某些單詞

Question

我是Python的新秀。 我正在嘗試計算html文件中的某些單詞或表達式。 例如，我有一段html，其源代碼如下：

<div style="line-height:120%;text-align:justify;text-indent:24px;font-size:10.5pt;">
<font style="font-family:inherit;font-size:10.5pt;font-style:italic;font-weight:bold;">2013 vs. 2012&#160;&#160;</font>
<font style="font-family:inherit;font-size:10.5pt;">During 2013, the Company recognized a decommissioning charge of $117 million and a restoration liability of $50 million, partially offset by the 2013 reversal of the $56&#160;million tax indemnification liability associated with the 2006 sale of the Company&#8217;s Canadian subsidiary.</font></div>

我想計算一下“責任”出現在文章中的次數。 下面是我的代碼，它不起作用：

import os
from bs4 import BeautifulSoup

lst=os.listdir("C:/html/")
for x in lst:
    print (x)
    html = open ("C:/html/"+x,'rb')
    bsobj = BeautifulSoup(html,"html.parser")
    metricslist = bsobj.findAll(div.string ='liability')
    print(len(metricslist))

我知道bsobj.findAll（div.string ='liability'）是非常錯誤的，但是不知道代碼應該是什么。 任何幫助將不勝感激！

Answer 1

使用find()或find_all()時，可以在元素的文本上應用部分字符串匹配 ：

soup.find(text=lambda text: text and "liability" in text)

或者，可以使用正則表達式模式代替函數：

soup.find(text=re.compile(r"\bliability\b")

用python計算html文件上的某些單詞

問題描述

1 個解決方案

解決方案1
0 已采納 2016-10-06 19:39:44

用python計算html文件上的某些單詞

問題描述

1 個解決方案

解決方案1 0 已采納 2016-10-06 19:39:44

解決方案1
0 已采納 2016-10-06 19:39:44