简体   繁体   English

不一致的结果美丽的汤?

[英]Inconsistent results beautiful soup?

So, I'm scraping a website and while I am able to return the webpage's html the beautiful "find" results are inconsistent.所以,我正在抓取一个网站,虽然我能够返回网页的 html,但漂亮的“查找”结果却不一致。 While running retrieving the same page over and over, BS4 can sometimes find a certain tag and sometimes it cannot.在一遍又一遍地运行检索同一页面时,BS4 有时可以找到某个标签,有时则不能。

I tested the webpage len and when Beautiful soup was able to retrieve wanted tags , the len was 9220189 (correct size), and when it couldn't, 103557968. I printed the webpage on both sizes and they are consistent.我测试了网页 len,当 Beautiful Soup 能够检索到想要的标签时,len 是 9220189(正确的尺寸),如果不能,则是 103557968。我用两种尺寸打印了网页,它们是一致的。 In fact, the string I am looking for can be found in both webpage prints.事实上,我正在寻找的字符串可以在两个网页打印中找到。

Could this be a size limitation of beautiful soup?这可能是美汤的大小限制吗? I'm not sure what's going on?我不确定发生了什么?

EDIT: link: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=1编辑:链接: https : //www.brenda-enzymes.org/ligand.php?brenda_ligand_id=1

what I am looking for: soup.find(string='Molecular Formula')我在找什么:soup.find(string='Molecular Formula')

The string can sometimes be found, sometimes it cannot.有时可以找到字符串,有时则不能。 The text is in the webpage always and is not being loaded with javascript.文本始终在网页中,并且未加载 javascript。

I've solved this by reducing the size of the html.我已经通过减小 html 的大小解决了这个问题。 While I don't find a better solution, this will have to do.虽然我没有找到更好的解决方案,但必须这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM