Using a Loop to enter values in “start_urls” function to input values from a csv

Question

I basically have a list of titles to search on a website which are stored in a csv.

I'm extracting those values and then trying to add append them to the search link in the start_urls function.

However, when I run the script, it only takes the last value of the list. Is there any particular reason why this happens?

class MySpider(CrawlSpider):
      name = "test"
      allowed_domains = ["example.com"]
      import pandas as pd
      df = pd.read_csv('test.csv')
      saved_column = df.ProductName
      for a in saved_column:
        start_urls = ["http://www.example.com/search?noOfResults=20&keyword="+str(a)"]

      def parse(self,response):

Answer 1

There is a conceptual error in your code. You are making the loop but without any action other than rotating the urls. So the parse function is called with the last value of the loop.

A possible other approach would be to override 'start_requests' method of the spider:

def start_requests(self):
    df = pd.read_csv('test.csv')
    saved_column = df.ProductName
    for url in saved_column:
        yield Request(url, self.parse)

Idea got from here: How to generate the start_urls dynamically in crawling?

Using a Loop to enter values in “start_urls” function to input values from a csv

Question

1 answers

solution1
1 ACCPTED 2015-02-02 11:16:00

Using a Loop to enter values in “start_urls” function to input values from a csv

Question

1 answers

solution1 1 ACCPTED 2015-02-02 11:16:00

solution1
1 ACCPTED 2015-02-02 11:16:00