How to append data to an existing Excel file based on the input?

Question

I am currently writing a Scrapy Webcrawler that is meant to extract data from a site's page and append those data to an existing excel( ".tmp.xlsx" ) file. The file comes with prepopulated column headers like "name", "country", "state", "zip code", "address", "phone number" . The sites i will be scraping most times wont have data to populate all columns. Some can have data for just "country", "state", "zip code" and "phone number" .. I need help setting up my pipelines.py in a way whereby i will be appending to the file based on the type of data i get from the site im crawling..

Answer 1

One option (which may not be what you are looking for) is to just append the data to a CSV (using Scrapy's builtin CsvItemExporter ). Then in the close_spider method, convert it to an excel file (using eg, pandas ).

Answer 2

this code maybe help you put this in setting.py

FEED_FORMAT = 'csv'  #format
FEED_URI =  "tmp.csv" #the path of output

# put this in the last of spider 
    def close(self, reason):
        df=pd.read_csv("tmp.csv")
        df.to_excel("tmp.xlsx",index=False) #to do not  make index

If you need any help do not hesitate to ask

How to append data to an existing Excel file based on the input?

Question

2 answers

solution1
0 2021-07-09 19:15:00

solution2
0 2021-07-16 20:46:42

How to append data to an existing Excel file based on the input?

Question

2 answers

solution1 0 2021-07-09 19:15:00

solution2 0 2021-07-16 20:46:42

solution1
0 2021-07-09 19:15:00

solution2
0 2021-07-16 20:46:42