网页抓取发现没有移动到下一个项目

Question

from bs4 import BeautifulSoup
import requests


def kijiji():
    source = requests.get('https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274').text
    soup = BeautifulSoup(source,'lxml')
    b = soup.find('div', class_='price')
    for link in soup.find_all('a',class_ = 'title'):
        a = link.get('href')
        fulllink = 'http://kijiji.ca'+a
        print(fulllink)
        b = soup.find('div', class_='price')
        print(b.prettify())
kijiji()

Usage of this is to sum up all the different kinds of items sold in kijiji and pair them up with a price.用法是总结kijiji中出售的所有不同种类的物品，并将它们与价格配对。 But I can't seem to find anyway to increment what beautiful soup is finding with a class of price, and I'm stuck with the first price.但我似乎无论如何都找不到用一类价格来增加美味汤的价值，而且我坚持第一个价格。 Find_all doesn't work either as it just prints out the whole blob instead of grouping it together with each item. Find_all 也不起作用，因为它只是打印出整个 blob，而不是将它与每个项目组合在一起。

Answer 1

If you have Beautiful soup 4.7.1 or above you can use following css selector select() which is much faster.如果您有 Beautiful Soup 4.7.1 或更高版本，您可以使用以下 css 选择器select() ，它会快得多。

code:代码：

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.select('.info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.select_one('.price').text.strip()
    print(price)

Or to use find_all() use below code block或者使用find_all()使用下面的代码块

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.find_all('div',class_='info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.find_next(class_='price').text.strip()
    print(price)

Answer 2

Congratulations on finding the answer.恭喜你找到了答案。 I'll give you another solution for reference only.我给你另一种解决方案，仅供参考。

import requests
from simplified_scrapy.simplified_doc import SimplifiedDoc
def kijiji():
  url = 'https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274'
  source = requests.get(url).text
  doc = SimplifiedDoc(source)
  infos = doc.getElements('div',attr='class',value='info-container')
  for info in infos:
    price = info.select('div.price>text()')
    a = info.select('a.title')
    link = doc.absoluteUrl(url,a.href)
    title = a.text
    print (price)
    print (link)
    print (title)
kijiji()

Result:结果：

$310.00
https://www.kijiji.ca/v-mens-shoes/markham-york-region/jordan-4-oreo-2015/1485391828
Jordan 4 Oreo (2015)
$560.00
https://www.kijiji.ca/v-mens-shoes/markham-york-region/yeezy-boost-350-yecheil-reflectives/1486296645
Yeezy Boost 350 Yecheil Reflectives
...

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples这里有更多例子： https : //github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Answer 3

from bs4 import BeautifulSoup
import requests


def kijiji():
    source = requests.get('https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274').text
    soup = BeautifulSoup(source,'lxml')
    b = soup.find('div', class_='price')
    for link in soup.find_all('a',class_ = 'title'):
        a = link.get('href')
        fulllink = 'http://kijiji.ca'+a
        print(fulllink)
        print(b.prettify())
        b = b.find_next('div', class_='price')
kijiji()

Was stuck on this for an hour, as soon as I posted this on stack I immediately came up with an idea, messy code but works!在这个问题上被困了一个小时，当我将它发布到堆栈上时，我立即想出了一个想法，代码凌乱但有效！

网页抓取发现没有移动到下一个项目

问题描述

3 个解决方案

解决方案1
1 2020-02-09 22:44:49

解决方案2
1 已采纳 2020-02-09 23:05:16

解决方案3
0 2020-02-09 18:32:05

网页抓取发现没有移动到下一个项目

问题描述

3 个解决方案

解决方案1 1 2020-02-09 22:44:49

解决方案2 1 已采纳 2020-02-09 23:05:16

解决方案3 0 2020-02-09 18:32:05

解决方案1
1 2020-02-09 22:44:49

解决方案2
1 已采纳 2020-02-09 23:05:16

解决方案3
0 2020-02-09 18:32:05