如何根据输入将 append 数据转换为现有的 Excel 文件？

Question

I am currently writing a Scrapy Webcrawler that is meant to extract data from a site's page and append those data to an existing excel( ".tmp.xlsx" ) file.我目前正在编写一个 Scrapy Webcrawler，用于从网站页面中提取数据，并将 append 这些数据提取到现有的 excel( ".tmp.xlsx" ) 文件中。 The file comes with prepopulated column headers like "name", "country", "state", "zip code", "address", "phone number" .该文件带有预填充的列标题，例如"name"、"country"、"state"、"zip code"、"address"、"phone number" 。 The sites i will be scraping most times wont have data to populate all columns.我最常抓取的网站不会有数据来填充所有列。 Some can have data for just "country", "state", "zip code" and "phone number" .. I need help setting up my pipelines.py in a way whereby i will be appending to the file based on the type of data i get from the site im crawling..有些可能只有“国家”、“州”、“邮政编码”和“电话号码”的数据。我需要帮助设置我的pipelines.py ，以便我根据类型附加到文件中我从我正在爬行的网站获得的数据..

Answer 1

One option (which may not be what you are looking for) is to just append the data to a CSV (using Scrapy's builtin CsvItemExporter ).一种选择（可能不是您正在寻找的）是将 append 数据发送到 CSV （使用 Scrapy 的内置CsvItemExporter ）。 Then in the close_spider method, convert it to an excel file (using eg, pandas ).然后在close_spider方法中，将其转换为 excel 文件（使用例如pandas ）。

Answer 2

this code maybe help you put this in setting.py此代码可能会帮助您将其放入 setting.py

FEED_FORMAT = 'csv'  #format
FEED_URI =  "tmp.csv" #the path of output

# put this in the last of spider 
    def close(self, reason):
        df=pd.read_csv("tmp.csv")
        df.to_excel("tmp.xlsx",index=False) #to do not  make index

If you need any help do not hesitate to ask如果您需要任何帮助，请随时询问

如何根据输入将 append 数据转换为现有的 Excel 文件？

问题描述

2 个解决方案

解决方案1
0 2021-07-09 19:15:00

解决方案2
0 2021-07-16 20:46:42

如何根据输入将 append 数据转换为现有的 Excel 文件？

问题描述

2 个解决方案

解决方案1 0 2021-07-09 19:15:00

解决方案2 0 2021-07-16 20:46:42

解决方案1
0 2021-07-09 19:15:00

解决方案2
0 2021-07-16 20:46:42