简体   繁体   English

如何能同时发出两个请求

[英]How can make two requests simultaneously with scrapy

I am scraping the job sites where the first page ahs the links to all the jobs. 我正在抓取第一页上所有工作链接的工作站点。 Now i am storing the title , job , company from the first page. 现在我从第一页开始存储标题,职位,公司。

But i also want to store the description , which is available by clicking on the job title. 但我也想存储描述,可通过单击职位名称获得。 I want to store that as well with the current items. 我也想将其与当前项目一起存储。

This is my curent code 这是我目前的代码

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    sites = hxs.select("//div[@class='jobenteries']")
    items = []
    for site in sites[:3]:
        print "Hello"
        item = DmozItem()
        item['title'] = site.select('a/text()').extract()
        item['desc'] = ''
        items.append(item)
    return items

But that description is on the next page link. 但是该说明在下一页链接上。 how can i do that 我怎样才能做到这一点

From the first page, return Requests for the second page and pass the data for each item in the request.meta dict. 从第一页返回第二页的请求,并传递request.meta dict中每个项目的数据。 On the callback method for the second page you can read the data you passed and return the fully populated item. 在第二页的回调方法上,您可以读取传递的数据并返回完全填充的项目。

See Passing additional data to callback functions in the scrapy docs for more details and an example. 有关更多详细信息和示例,请参见scrapy文档中的将其他数据传递给回调函数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM