为什么xpath的extract（）返回一个锚元素的href属性的空列表？

Question

Why do I get an empty list when trying to extract the href attribute of the anchor tag located on the following url: https://www.udemy.com/courses/search/?src=ukw&q=accounting using scrapy? 为什么我在尝试提取位于以下网址的锚标记的href属性时会得到一个空列表： https ：//www.udemy.com/courses/search/？src = ukw ＆ q =使用scrapy进行计数？

This is my code to extract the <a></a> element located inside the list-view-course-card--course-card-wrapper--TJ6ET class: 这是我提取位于list-view-course-card--course-card-wrapper--TJ6ET类中的<a></a>元素的代码：

response.xpath("//div[@class='list-view-course-card--course-card-wrapper--TJ6ET']/a/@href").extract()

Answer 1

This site makes API calls to retrieve all the data. 该站点进行API调用以检索所有数据。 You can use the scrapy shell to see the response that the site is returning. 您可以使用scrapy shell查看该站点返回的响应。 scrapy shell 'https://www.udemy.com/courses/search/?src=ukw&q=accounting' and then view(response) . scrapy shell 'https://www.udemy.com/courses/search/?src=ukw&q=accounting'然后view(response) 。

The data you are looking for is available at the following api call : ' https://www.udemy.com/api-2.0/search-courses/?fields[locale]=simple_english_title&src=ukw&q=accounting ' . 您正在寻找的数据可通过以下api电话获得：' https ://www.udemy.com/api-2.0/search-courses/?fields[locale] = simple_english_title & src = ukw & q =accounting '。 However, if you try to access this link directly, you will get a json object saying that you do not have permission to perform this action. 但是，如果您尝试直接访问此链接，您将获得一个json对象，表示您无权执行此操作。 How did I find this link ? 我是如何找到这个链接的？ Load the url on your browser, and go to the network tab on your developer tools and look for XHR objects. 在浏览器中加载URL，然后转到开发人员工具上的网络选项卡并查找XHR对象。

The following spider will first make a request to the primary link and then make a request to the api call. 以下蜘蛛将首先向主链接发出请求，然后向api调用发出请求。 You will have to parse the json object that was returned to obtain your data. 您必须解析返回的json对象以获取数据。 If you want to scale this spider for more products, you might want to look for a pattern in the structure of the api call. 如果您想为更多产品扩展此蜘蛛，您可能希望在api调用的结构中查找模式。

import scrapy

class UdemySpider(scrapy.Spider):

    name = 'udemy'
    newurl = 'https://www.udemy.com/api-2.0/search-courses/?fields[locale]=simple_english_title&src=ukw&q=accounting'

    def start_requests(self):
        urls = ['https://www.udemy.com/courses/search/?src=ukw&q=accounting'

        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.api_call)

    def api_call(self, response):
        print("Working on second page")
        yield scrapy.Request(url=self.newurl, callback=self.parse)

    def parse(self, response):
        #code to parse json object
`

为什么xpath的extract（）返回一个锚元素的href属性的空列表？

问题描述

1 个解决方案

解决方案1
0 2019-06-25 00:31:50

为什么xpath的extract（）返回一个锚元素的href属性的空列表？

问题描述

1 个解决方案

解决方案1 0 2019-06-25 00:31:50

解决方案1
0 2019-06-25 00:31:50