I have tried to google a solution extensively, but may not be using the correct keywords. I am aware that I can use the shell to work with CSS and XPath selectors immediately, but I would like to know if this is possible to do in the IDE environment outside of the spider class, namely in another cell.
Example code:
class ExampleSpider(scrapy.Spider):
name = "exampleSpider"
start_urls = ["https://www.example.com"]
def parse(self, response):
URL = "www.example.com/1/"
yield response
I then want to be able to work with this response and selectors in another cell:
table_rows = response.xpath("//div[@class='example']/table/tr") # produces error
print(table_rows.xpath("td[4]//text()")[0] .get()
it produces the error: NameError: name 'response' is not defined
Any assistance/guidance would be highly appreciated.
If I understand correctly you want the spider to return the response and parse it in the main script?
main.py:
from scrapy.crawler import CrawlerProcess, CrawlerRunner
from scrapy.utils.project import get_project_settings
from scrapy.signalmanager import dispatcher
from scrapy import signals
def spider_output(spider):
output = []
def get_output(item):
output.append(item)
dispatcher.connect(get_output, signal=signals.item_scraped)
settings = get_project_settings()
settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
process = CrawlerProcess(settings)
process.crawl(spider)
process.start()
return output
if __name__ == "__main__":
spider = "exampleSpider"
response = spider_output(spider)
response = response[0]['response']
title = response.xpath('//h3//text()').get()
price = response.xpath('//div[@class="card-body"]/h4/text()').get()
print(f"Title: {title}")
print(f"Price: {price}")
We start the spider and appending the yielded items to output
. Since output
has only one value we don't have to loop at and just take the first value response[0]
. Then we want to get the value from the key response
, so response = response[0]['response']
.
spider.py:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "exampleSpider"
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def parse(self, response):
yield {'response': response}
Here we return an item with the response.
The steps are: main->spider_output->spider-> return response item to spider_output ->append the items to output list -> return output to main -> get the response from output -> parse the response.
Output:
Title: Long-sleeved Jersey Top
Price: $12.99
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.