简体   繁体   English

没有使用xpath和Scrapy从div类中获取所有a元素

[英]Not getting all the a elements from div class using xpath and Scrapy

I have been trying to get all the properties from this website. 我一直在尝试从该网站获取所有属性。 When I access all of them on the main search page I can retrieve all the information from all the properties, however when I need the information from actual property link, it only seems to go through one property link. 当我在主搜索页面上访问所有这些属性时,我可以从所有属性中检索所有信息,但是当我需要实际属性链接中的信息时,它似乎仅通过一个属性链接。

The main issue is in the link part, so when I actually try to access the link of the property. 主要问题在链接部分,因此当我实际尝试访问属性的链接时。 I only get the link and information from the first property but not from all the others. 我只能从第一个属性获得链接和信息,而不能从其他所有属性获得信息。

class PropDataSpider(scrapy.Spider):
    name = "remax"
    start_urls = ['https://www.remax.co.za/property_search/for-sale/?minprice=100000&maxprice=1000000000&displayorder=date&cities=432']


    def parse(self, response):

        propertes = response.xpath("//div[@class='w-container main-content remodal-bg']")
        for prop in propertes:
            link = 'http://www.remax.co.za/' + prop.xpath("./a/@href").extract_first()
            agency = self.name
            title = prop.xpath(
                ".//div[@class='property-item']/div[@class='w-clearfix']/p[@class='property-type']/text()").extract_first().strip()
            price = prop.xpath(
                 ".//div[@class='property-item']/div[@class='w-clearfix']/div/strong/text()").extract_first().strip()

...


           yield scrapy.Request(
                link,
                callback=self.parse_property,
                meta={
                    'agency': agency,
                    'title': title,
                    'price': price,
                    'description': description,
                    'bedrooms': bedrooms,
                    'bathrooms': bathrooms,
                    'garages': garages,
                }
            )


 def parse_property(self, response):
        agency = response.meta["agency"]
        title = response.meta["title"]
        price = response.meta["price"]
        description = response.meta["description"]
        bedrooms = response.meta["bedrooms"]
        bathrooms = response.meta["bathrooms"]
        garages = response.meta["garages"]


        yield {'agency': agency, 'title': title, 'price': price, "description": description, 'bedrooms': bedrooms,'bathrooms': bathrooms, 'garages': garages}

What I would like to get is all the other links to properties. 我想得到的是所有其他指向属性的链接。 I am not sure what I am doing wrong and how to fix this. 我不确定自己在做什么错以及如何解决此问题。

Thank you very much for help! 非常感谢您的帮助!

You need couple of changes: 您需要进行以下更改:

properties = response.xpath("//div[@class='w-container main-content remodal-bg']/a")
for prop in properties:
    link = 'http://www.remax.co.za/' + prop.xpath("./@href").extract_first()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM