简体   繁体   中英

How to append data to an existing Excel file based on the input?

I am currently writing a Scrapy Webcrawler that is meant to extract data from a site's page and append those data to an existing excel( ".tmp.xlsx" ) file. The file comes with prepopulated column headers like "name", "country", "state", "zip code", "address", "phone number" . The sites i will be scraping most times wont have data to populate all columns. Some can have data for just "country", "state", "zip code" and "phone number" .. I need help setting up my pipelines.py in a way whereby i will be appending to the file based on the type of data i get from the site im crawling..

One option (which may not be what you are looking for) is to just append the data to a CSV (using Scrapy's builtin CsvItemExporter ). Then in the close_spider method, convert it to an excel file (using eg, pandas ).

this code maybe help you put this in setting.py

FEED_FORMAT = 'csv'  #format
FEED_URI =  "tmp.csv" #the path of output
# put this in the last of spider 
    def close(self, reason):
        df=pd.read_csv("tmp.csv")
        df.to_excel("tmp.xlsx",index=False) #to do not  make index

If you need any help do not hesitate to ask

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM