简体   繁体   English

使用python的bs4的HTML选择器

[英]HTML Selector using python’s bs4

I'm fairly new at this, and I'm trying to work through Automate the Boring stuff and make some of my own programs along the way. 我在这方面还很陌生,我正在尝试通过“自动化无聊”工作,并在此过程中编写一些自己的程序。 I'm trying to use beautiful soup's 'select' method to pull the value '33' out of this code 我正在尝试使用漂亮的汤的“选择”方法从该代码中提取值“ 33”

<span class="wu-value wu-value-to" _ngcontent-c19="">33</span>

I know that the span element is inside a div and i've tried a few selectors including: 我知道span元素在div内,并且我尝试了一些选择器,包括:

high_temp = w_u_soup.select('div > span .wu-value wu-value-to')

But I haven't been able to get 33 out. 但是我还没有得到33分。 Any help would be appreciated. 任何帮助,将不胜感激。 I've tried to look up what _ngcontent-c19 is, but I'm having trouble understanding what i've found thus far (I'm trying to learn python and it seems I'll be learning a bit of HTML as a consequence) 我尝试查找_ngcontent-c19是什么,但是我无法理解到目前为止所发现的内容(我正在尝试学习python,因此似乎我将学习一些HTML )

I think you have a couple of different issues here. 我认为您在这里有几个不同的问题。

First, your selector is wrong -- the selector you have is trying to select an element called wu-value-to (which isn't a valid HTML element) inside something with class wu-value inside a span which is a direct descendent of a div . 首先,您的选择器是错误的-您所选择的选择器试图在span内的wu-value类的东西中选择一个名为wu-value-to的元素(这不是有效的HTML元素),该元素属于span的直接后代一个div To select an element with particular classes you need no space between the element name and the class descriptors. 要选择具有特定类的元素您无需在元素名称和类描述符之间留空格。

So your selector should probably be div > span.wu-value.wu-value-to . 因此,您的选择器可能应该是div > span.wu-value.wu-value-to If your entire HTML is the part you showed, just 'span' would be enough, but I'm guessing you are being specific by specifying the parent and those classes for a reason. 如果您显示的是整个HTML,那么“ span”就足够了,但是我猜您是通过指定父级和那些类来实现特定性的。

Second, you are selecting the element, not its text content. 其次,您要选择元素,而不是其文本内容。 You need your_node.text to get the text content. 您需要your_node.text来获取文本内容。

Putting it together, you should be able to get what you want with this: 放在一起,您应该可以通过以下方式获得想要的东西:

w_u_soup.select('div > span.wu-value.wu-value-to').text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM