尝试仅打印“状态失败”，但Python正在打印所有内容

Question

I am looping through the HTML content of a web page and trying to print only strings with the substring "state failed". 我正在浏览网页的HTML内容，并尝试仅打印子字符串为“状态失败”的字符串。 However, Python is printing every single string, even the ones that don't have the substring "state failed". 但是，Python会打印每个字符串，即使没有子字符串“状态失败”的字符串也是如此。

Here is my code: 这是我的代码：

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in link:
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

Here is one that I would expect to be printed, and it is. 这是我期望打印的，它是。

<rect class="state failed" data-original-title="Task_id: failure_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CruxCleanupOperator&lt;br&gt;Started: 2018-12-24T18:34:39.149434&lt;br&gt;Ended: 2018-12-24T18:34:45.935977&lt;br&gt;Duration: 6.78654&lt;br&gt;State: failed&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

Here is one that I would expect NOT to be printed, but form some odd reason it is being printed. 这是我不希望打印的，但出于某种奇怪的原因，它正在打印。

<rect class="state success" data-original-title="Task_id: join_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CompletionBranchOperator&lt;br&gt;Started: 2018-12-24T18:33:30.834983&lt;br&gt;Ended: 2018-12-24T18:33:33.037330&lt;br&gt;Duration: 2.20235&lt;br&gt;State: success&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

I tied all combinations with the single quotes, double, and even triple. 我将所有组合都用单引号，双引号甚至三引号捆绑在一起。 It dodesn't matter. 没关系。 It prints everything, even the strings that don't contain "State failed". 它会打印所有内容，甚至不包含“状态失败”的字符串。 Any idea what's wrong here? 知道这里有什么问题吗？ Thanks. 谢谢。

Answer 1

Maybe you can try making the link into a string: 也许您可以尝试将link变成字符串：

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in str(link):
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

Then it should work. 然后它应该工作。

Answer 2

Instead of if "state failed" in link: , replace it with if "state failed" is link.get('class') or if "state failed" == link.get('class') . 而不是if "state failed" in link: ，则将其替换为if "state failed" is link.get('class')或if "state failed" == link.get('class') 。 I think you better go with is , since you may get None for link.get('class') is class attribute is not there. 我认为您最好使用is ，因为您可能会因为link.get('class')没有class属性而得到None 。

You can also do it this way: 您也可以这样操作：

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect', attrs={'class': 'state failed'}):
    if link.isoweekday() in range(1, 6):
        outF.write(str(link))
        outF.write('\n')   
outF.close()

Source 资源

尝试仅打印“状态失败”，但Python正在打印所有内容

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-12-25 05:13:32

解决方案2
1 2018-12-25 05:20:21

尝试仅打印“状态失败”，但Python正在打印所有内容

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-12-25 05:13:32

解决方案2 1 2018-12-25 05:20:21

解决方案1
1 已采纳 2018-12-25 05:13:32

解决方案2
1 2018-12-25 05:20:21