bs4没有得到所有列表项

Question

Using the following url as an example, the code only gets 35 items instead of the 85 listed on the page.以下面的 url为例，代码只得到 35 项，而不是页面上列出的 85 项。 Is this a case of having to use selenium to load the view-source page?这是必须使用 selenium 加载查看源页面的情况吗？ How could bs4 miss the rest of the li items? bs4怎么会错过li项的rest？

r = requests.get(url=url)
soup = bs(r.text, 'html.parser')
jobkeys = []
jobs = soup.findAll("li", {"class": "cmp-JobListItem"})
for job in jobs:
    s = job.attrs.get('data-tn-entityid')
    jobkey = s[s.find(',')+1:s.rfind(',')]
    jobkeys.append(jobkey)

Edit:编辑：

Using selenium I was able to "see" what was going on when the page loaded.使用 selenium 我能够“看到”页面加载时发生的情况。 The URL automatically redirected to a prefiltered mobile site. URL 自动重定向到预过滤的移动站点。

With this new site I could remove the filter, get the new url and obtain the correct number.有了这个新站点，我可以卸下过滤器，获得新的 url 并获得正确的编号。

Thanks!谢谢！

Answer 1

I have to be honest: I tried your code as you wrote it several times and I have got a list of 85 items: no more, nor less.老实说：我试过你的代码，因为你写了好几次，我得到了一个 85 项的列表：不多也不少。 So I don't know exactly how to answer the second question, but I can answer the first one: no, you don't have to use other other packages to achieve what you want: the problem is elsewhere.所以我不知道如何回答第二个问题，但我可以回答第一个问题：不，您不必使用其他其他软件包来实现您想要的：问题出在其他地方。 Just to be sure, I'm going to copypaste here the full code I just run:可以肯定的是，我将在此处复制粘贴我刚刚运行的完整代码：

from bs4 import BeautifulSoup as bs
import requests

r = requests.get(url='https://ca.indeed.com/cmp/Abb/jobs')
soup = bs(r.text, 'html.parser')
jobkeys = []
jobs = soup.findAll("li", {"class": "cmp-JobListItem"})
for job in jobs:
    s = job.attrs.get('data-tn-entityid')
    jobkey = s[s.find(',')+1:s.rfind(',')]
    jobkeys.append(jobkey)

print(len(jobkeys))

Output: Output：

bs4没有得到所有列表项

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-03 02:29:44

bs4没有得到所有列表项

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-03 02:29:44

解决方案1
2 已采纳 2020-05-03 02:29:44