简体   繁体   English

Python美丽汤选择文本

[英]Python beautiful soup select text

The following is an example of the HTML code I want to parse: 以下是我要解析的HTML代码的示例:

<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>

I am using beautiful soup to parse the HTML code by selecting style8 as follows (where html reads the result of my http request): 我正在使用漂亮的汤通过选择style8来解析HTML代码,如下所示(其中html读取我的http请求的结果):

html = result.read()
soup = BeautifulSoup(html)

content = soup.select('.style8')

In this example, the content variable returns a list of 4 Tags. 在此示例中, content变量返回4个标签的列表。 I want to check the content.text , which contains the text of each style8 class, for each item in the list if it contains Example and appends that to a variable. 我想检查content.text ,其中包含列表中每个项目的每个style8类的文本(如果它包含Example并将其附加到变量中)。 If it proceeds through the entire list and Example does not occur within the list, it then appends Not present to the variable. 如果它遍历整个列表,并且列表中没有出现Example ,则将Not present Present附加到变量中。

I have got the following so far: 到目前为止,我有以下内容:

foo = []

for i, tag in enumerate(content):
    if content[i].text == 'Example':
        foo.append('Example')
        break
    else:
        continue

This will only append Example to foo if it occurs, however it will not append Not Present if it does not occur within the entire list. 如果出现,它将仅将Example附加到foo ,但是如果没有出现在整个列表中,则不会将Not Present附加。

Any method of doing so is appreciated, or better way of searching the entire results to check if a string is present would be great 任何这样做的方法都将受到赞赏,或者更好的搜索整个结果以检查是否存在字符串的方法会很棒

You can use find_all() to find all td elements with class='style8' and use list comprehension to construct the foo list: 您可以使用find_all()查找所有class='style8' td元素,并使用列表class='style8'来构造foo列表:

from bs4 import BeautifulSoup


html = """<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""

soup = BeautifulSoup(html)

foo = ["Example" if "Example" in node.text else "Not Present" 
       for node in soup.find_all('td', {'class': 'style8'})]
print foo

prints: 打印:

['Example', 'Not Present', 'Not Present', 'Not Present']

If you just want to check whether it was found or not, you could use a simple boolean flag as follow : 如果只想检查是否找到了它,可以使用一个简单的布尔标志,如下所示:

foo = []
found = False
for i, tag in enumerate(content):
    if content[i].text == 'Example':
        found = True
        foo.append('Example')
        break
    else:
        continue
if not found:
    foo.append('Not Example')

If I get what you want, this may be a simple approach, though the solution of alecxe looks amazing. 如果我得到了您想要的,这可能是一种简单的方法,尽管alecxe的解决方案看起来很棒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM