简体   繁体   English

尝试仅打印“状态失败”,但Python正在打印所有内容

[英]Trying to print out only “state failed” but Python is printing everything

I am looping through the HTML content of a web page and trying to print only strings with the substring "state failed". 我正在浏览网页的HTML内容,并尝试仅打印子字符串为“状态失败”的字符串。 However, Python is printing every single string, even the ones that don't have the substring "state failed". 但是,Python会打印每个字符串,即使没有子字符串“状态失败”的字符串也是如此。

Here is my code: 这是我的代码:

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in link:
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

Here is one that I would expect to be printed, and it is. 这是我期望打印的,它是。

<rect class="state failed" data-original-title="Task_id: failure_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CruxCleanupOperator&lt;br&gt;Started: 2018-12-24T18:34:39.149434&lt;br&gt;Ended: 2018-12-24T18:34:45.935977&lt;br&gt;Duration: 6.78654&lt;br&gt;State: failed&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

Here is one that I would expect NOT to be printed, but form some odd reason it is being printed. 这是我不希望打印的,但出于某种奇怪的原因,它正在打印。

<rect class="state success" data-original-title="Task_id: join_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CompletionBranchOperator&lt;br&gt;Started: 2018-12-24T18:33:30.834983&lt;br&gt;Ended: 2018-12-24T18:33:33.037330&lt;br&gt;Duration: 2.20235&lt;br&gt;State: success&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

I tied all combinations with the single quotes, double, and even triple. 我将所有组合都用单引号,双引号甚至三引号捆绑在一起。 It dodesn't matter. 没关系。 It prints everything, even the strings that don't contain "State failed". 它会打印所有内容,甚至不包含“状态失败”的字符串。 Any idea what's wrong here? 知道这里有什么问题吗? Thanks. 谢谢。

Maybe you can try making the link into a string: 也许您可以尝试将link变成字符串:

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in str(link):
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

Then it should work. 然后它应该工作。

Instead of if "state failed" in link: , replace it with if "state failed" is link.get('class') or if "state failed" == link.get('class') . 而不是if "state failed" in link: ,则将其替换为if "state failed" is link.get('class')if "state failed" == link.get('class') I think you better go with is , since you may get None for link.get('class') is class attribute is not there. 我认为您最好使用is ,因为您可能会因为link.get('class')没有class属性而得到None

You can also do it this way: 您也可以这样操作:

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect', attrs={'class': 'state failed'}):
    if link.isoweekday() in range(1, 6):
        outF.write(str(link))
        outF.write('\n')   
outF.close()

Source 资源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM