简体   繁体   English

美丽的汤:findall和引用类

[英]Beautiful Soup: findall and quoted classes

I am a new python user banging my head against a wall on a BS issue. 我是一名新的python用户,在BS问题上将我的头撞在墙上。 My target page contains the snipits below: 我的目标页面包含以下片段:

<div class=rbHeader>
<span role="heading" aria-level="3" class="ws_bold">
Experience Level</span>
</div>

<div class="  row  result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">

...

</div>

I have parsed the page as follows: 我将页面解析如下:

   target = Soup(urllib.urlopen(url), "lxml") 

If I run 如果我跑步

targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements

I get 我懂了

 [<div class="rbHeader">\n<span aria-level="3" class="ws_bold" role="heading">\nExperience Level</span>\n</div>]

but if I run 但是如果我跑步

targetElements = target.findAll('div', attrs={'class':'  row  result'})
print targetElements

i get 我得到

[]

This is the case no matter which class i try to select if that class is in quotes. 无论我尝试选择哪个类(如果用引号引起来),都是如此。 i can only seem to find classes that are outside of quotes. 我似乎只能找到引号之外的类。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Best Ryan 最佳瑞安

Spaces are stripped from all classes, always. 总是从所有类中删除空格。

You can just get one class: 您只能上一堂课:

targetElements = target.findAll('div', attrs={'class':'row'})

...or: ...要么:

targetElements = target.findAll('div', attrs={'class':'result'})

If you are suspicious that each of these may return too many results, you can do: 如果您怀疑其中每一个都可能返回太多结果,则可以执行以下操作:

soup.select('div.row.result')

....where soup is your instance. .... soup在哪里?

Here is an example based on your div : 这是一个基于您的div的示例:

div_test='<div class=rbHeader><span role="heading" aria-level="3" class="ws_bold">Experience Level</span></div><div class="  row  result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob"></div>'
target = bs4.BeautifulSoup(div_test,'html.parser')

1, class name is case sensitive, your code 1,类名区分大小写,您的代码

targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements

will get nothing [] . 一无所获[]

targetElements = target.findAll('div', attrs={'class':'rbHeader'})
print targetElements

Will give you: 会给你:

[<div class="rbHeader"><span aria-level="3" class="ws_bold" role="heading">Experience Level</span></div>]

2, For the code: 2,对于代码:

targetElements = target.findAll('div', attrs={'class':'  row  result'})
print targetElements

It will give you the result instead of nothing: 它会给您结果而不是什么:

[<div class=" row result" data-jk="bc0437dce636c6f4" data-tn-component="organicJob" id="p_bc0437dce636c6f4" itemscope="" itemtype="http://schema.org/JobPosting"></div>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM