[英]Trying to print out only “state failed” but Python is printing everything
I am looping through the HTML content of a web page and trying to print only strings with the substring "state failed". 我正在浏览网页的HTML内容,并尝试仅打印子字符串为“状态失败”的字符串。 However, Python is printing every single string, even the ones that don't have the substring "state failed". 但是,Python会打印每个字符串,即使没有子字符串“状态失败”的字符串也是如此。
Here is my code: 这是我的代码:
soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
if "state failed" in link:
if link.isoweekday() in range(1, 6):
outF.write(str(link))
outF.write('\n')
outF.close()
Here is one that I would expect to be printed, and it is. 这是我期望打印的,它是。
<rect class="state failed" data-original-title="Task_id: failure_cleanup<br>Run: 2018-12-22T04:00:00<br>Operator: CruxCleanupOperator<br>Started: 2018-12-24T18:34:39.149434<br>Ended: 2018-12-24T18:34:45.935977<br>Duration: 6.78654<br>State: failed<br>" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>
Here is one that I would expect NOT to be printed, but form some odd reason it is being printed. 这是我不希望打印的,但出于某种奇怪的原因,它正在打印。
<rect class="state success" data-original-title="Task_id: join_cleanup<br>Run: 2018-12-22T04:00:00<br>Operator: CompletionBranchOperator<br>Started: 2018-12-24T18:33:30.834983<br>Ended: 2018-12-24T18:33:33.037330<br>Duration: 2.20235<br>State: success<br>" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>
I tied all combinations with the single quotes, double, and even triple. 我将所有组合都用单引号,双引号甚至三引号捆绑在一起。 It dodesn't matter. 没关系。 It prints everything, even the strings that don't contain "State failed". 它会打印所有内容,甚至不包含“状态失败”的字符串。 Any idea what's wrong here? 知道这里有什么问题吗? Thanks. 谢谢。
Maybe you can try making the link
into a string: 也许您可以尝试将link
变成字符串:
soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
if "state failed" in str(link):
if link.isoweekday() in range(1, 6):
outF.write(str(link))
outF.write('\n')
outF.close()
Then it should work. 然后它应该工作。
Instead of if "state failed" in link:
, replace it with if "state failed" is link.get('class')
or if "state failed" == link.get('class')
. 而不是if "state failed" in link:
,则将其替换为if "state failed" is link.get('class')
或if "state failed" == link.get('class')
。 I think you better go with is
, since you may get None
for link.get('class')
is class
attribute is not there. 我认为您最好使用is
,因为您可能会因为link.get('class')
没有class
属性而得到None
。
You can also do it this way: 您也可以这样操作:
soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect', attrs={'class': 'state failed'}):
if link.isoweekday() in range(1, 6):
outF.write(str(link))
outF.write('\n')
outF.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.