无法使用 Scrapy 从下拉列表中抓取

Question

I am trying to scrape a list of markets from a JS dropdown list embedded on a website: https://e27.co/startups我试图从嵌入在网站上的 JS 下拉列表中抓取市场列表： https : //e27.co/startups

Using scrapy shell, I tried to scrape the list of markets from the 'Markets' dropdown menu but unable to do so.使用scrapy shell，我试图从“市场”下拉菜单中抓取市场列表，但无法这样做。

After running scrapy shell 'https://e27.co/startups' , I tried using both response.css() as well as response.xpath() .运行scrapy shell 'https://e27.co/startups' ，我尝试同时使用response.css()和response.xpath() 。

For css selector:对于 css 选择器：

response.css('#startups-page > div > div.search-block.box-view > div.row.mbt-s > div > div > ul > li:nth-child(3)')

For xpath, I tried:对于 xpath，我试过：

response.xpath('//*[@id="startups-page"]/div/div[1]/div[2]/div/div/ul/li[3]/a"')

Both are obtained from inspecting the dropdown element.两者都是通过检查下拉元素获得的。

However, an empty list is returned.但是，返回一个空列表。

May I know how to scrape all the different markets from the dropdown list?我可以知道如何从下拉列表中抓取所有不同的市场吗？ Thanks.谢谢。

Answer 1

This data is located in separate small request to https://e27.co/startups?json .此数据位于对https://e27.co/startups?json单独小请求中。

From scrapy shell "https://e27.co/startups?json" I could get whole list with this expression:从scrapy shell "https://e27.co/startups?json"我可以用这个表达式得到整个列表：

In [1]: response.css('select#market option::text').extract()
Out[1]: 
[u'Advertising',
 u'Aerospace',
 u'Agency & Consulting',
 u'Agritech',
 u'Architecture & Construction',
...

无法使用 Scrapy 从下拉列表中抓取

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-13 06:54:27

无法使用 Scrapy 从下拉列表中抓取

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-13 06:54:27

解决方案1
1 已采纳 2019-06-13 06:54:27