简体   繁体   English

bs4没有得到所有列表项

[英]bs4 not getting all list items

Using the following url as an example, the code only gets 35 items instead of the 85 listed on the page.以下面的 url为例,代码只得到 35 项,而不是页面上列出的 85 项。 Is this a case of having to use selenium to load the view-source page?这是必须使用 selenium 加载查看源页面的情况吗? How could bs4 miss the rest of the li items? bs4怎么会错过li项的rest?

r = requests.get(url=url)
soup = bs(r.text, 'html.parser')
jobkeys = []
jobs = soup.findAll("li", {"class": "cmp-JobListItem"})
for job in jobs:
    s = job.attrs.get('data-tn-entityid')
    jobkey = s[s.find(',')+1:s.rfind(',')]
    jobkeys.append(jobkey)

Edit:编辑:

Using selenium I was able to "see" what was going on when the page loaded.使用 selenium 我能够“看到”页面加载时发生的情况。 The URL automatically redirected to a prefiltered mobile site. URL 自动重定向到预过滤的移动站点。

With this new site I could remove the filter, get the new url and obtain the correct number.有了这个新站点,我可以卸下过滤器,获得新的 url 并获得正确的编号。

Thanks!谢谢!

I have to be honest: I tried your code as you wrote it several times and I have got a list of 85 items: no more, nor less.老实说:我试过你的代码,因为你写了好几次,我得到了一个 85 项的列表:不多也不少。 So I don't know exactly how to answer the second question, but I can answer the first one: no, you don't have to use other other packages to achieve what you want: the problem is elsewhere.所以我不知道如何回答第二个问题,但我可以回答第一个问题:不,您不必使用其他其他软件包来实现您想要的:问题出在其他地方。 Just to be sure, I'm going to copypaste here the full code I just run:可以肯定的是,我将在此处复制粘贴我刚刚运行的完整代码:

from bs4 import BeautifulSoup as bs
import requests

r = requests.get(url='https://ca.indeed.com/cmp/Abb/jobs')
soup = bs(r.text, 'html.parser')
jobkeys = []
jobs = soup.findAll("li", {"class": "cmp-JobListItem"})
for job in jobs:
    s = job.attrs.get('data-tn-entityid')
    jobkey = s[s.find(',')+1:s.rfind(',')]
    jobkeys.append(jobkey)

print(len(jobkeys))

Output: Output:

85

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM