[英]How to append data to an existing Excel file based on the input?
I am currently writing a Scrapy Webcrawler that is meant to extract data from a site's page and append those data to an existing excel( ".tmp.xlsx" ) file.我目前正在编写一个 Scrapy Webcrawler,用于从网站页面中提取数据,并将 append 这些数据提取到现有的 excel( ".tmp.xlsx" ) 文件中。 The file comes with prepopulated column headers like "name", "country", "state", "zip code", "address", "phone number" .该文件带有预填充的列标题,例如"name"、"country"、"state"、"zip code"、"address"、"phone number" 。 The sites i will be scraping most times wont have data to populate all columns.我最常抓取的网站不会有数据来填充所有列。 Some can have data for just "country", "state", "zip code" and "phone number" .. I need help setting up my pipelines.py in a way whereby i will be appending to the file based on the type of data i get from the site im crawling..有些可能只有“国家”、“州”、“邮政编码”和“电话号码”的数据。我需要帮助设置我的pipelines.py ,以便我根据类型附加到文件中我从我正在爬行的网站获得的数据..
One option (which may not be what you are looking for) is to just append the data to a CSV (using Scrapy's builtin CsvItemExporter
).一种选择(可能不是您正在寻找的)是将 append 数据发送到 CSV (使用 Scrapy 的内置CsvItemExporter
)。 Then in the close_spider
method, convert it to an excel file (using eg, pandas
).然后在close_spider
方法中,将其转换为 excel 文件(使用例如pandas
)。
this code maybe help you put this in setting.py此代码可能会帮助您将其放入 setting.py
FEED_FORMAT = 'csv' #format
FEED_URI = "tmp.csv" #the path of output
# put this in the last of spider
def close(self, reason):
df=pd.read_csv("tmp.csv")
df.to_excel("tmp.xlsx",index=False) #to do not make index
If you need any help do not hesitate to ask如果您需要任何帮助,请随时询问
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.