[英]Python beautiful soup select text
The following is an example of the HTML code I want to parse: 以下是我要解析的HTML代码的示例:
<html>
<body>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>
I am using beautiful soup to parse the HTML code by selecting style8 as follows (where html reads the result of my http request): 我正在使用漂亮的汤通过选择style8来解析HTML代码,如下所示(其中html读取我的http请求的结果):
html = result.read()
soup = BeautifulSoup(html)
content = soup.select('.style8')
In this example, the content
variable returns a list of 4 Tags. 在此示例中,
content
变量返回4个标签的列表。 I want to check the content.text
, which contains the text of each style8
class, for each item in the list if it contains Example
and appends that to a variable. 我想检查
content.text
,其中包含列表中每个项目的每个style8
类的文本(如果它包含Example
并将其附加到变量中)。 If it proceeds through the entire list and Example
does not occur within the list, it then appends Not present
to the variable. 如果它遍历整个列表,并且列表中没有出现
Example
,则将Not present
Present附加到变量中。
I have got the following so far: 到目前为止,我有以下内容:
foo = []
for i, tag in enumerate(content):
if content[i].text == 'Example':
foo.append('Example')
break
else:
continue
This will only append Example
to foo
if it occurs, however it will not append Not Present
if it does not occur within the entire list. 如果出现,它将仅将
Example
附加到foo
,但是如果没有出现在整个列表中,则不会将Not Present
附加。
Any method of doing so is appreciated, or better way of searching the entire results to check if a string is present would be great 任何这样做的方法都将受到赞赏,或者更好的搜索整个结果以检查是否存在字符串的方法会很棒
You can use find_all()
to find all td
elements with class='style8'
and use list comprehension to construct the foo
list: 您可以使用
find_all()
查找所有class='style8'
td
元素,并使用列表class='style8'
来构造foo
列表:
from bs4 import BeautifulSoup
html = """<html>
<body>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""
soup = BeautifulSoup(html)
foo = ["Example" if "Example" in node.text else "Not Present"
for node in soup.find_all('td', {'class': 'style8'})]
print foo
prints: 打印:
['Example', 'Not Present', 'Not Present', 'Not Present']
If you just want to check whether it was found or not, you could use a simple boolean flag as follow : 如果只想检查是否找到了它,可以使用一个简单的布尔标志,如下所示:
foo = []
found = False
for i, tag in enumerate(content):
if content[i].text == 'Example':
found = True
foo.append('Example')
break
else:
continue
if not found:
foo.append('Not Example')
If I get what you want, this may be a simple approach, though the solution of alecxe looks amazing. 如果我得到了您想要的,这可能是一种简单的方法,尽管alecxe的解决方案看起来很棒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.