[英]How do I update a CSV file with pandas without adding duplicates
I'm trying to get some data off the web and it's taking a while.我正在尝试从 web 中获取一些数据,这需要一段时间。 In case anything happens I've been periodically saving the data in a csv file.
万一发生任何事情,我会定期将数据保存在 csv 文件中。
However, it just appends a new copy of the dataframe to the CSV file.但是,它只是将 dataframe 的新副本附加到 CSV 文件中。 This means that there's loads of duplicates in the file.
这意味着文件中有大量重复项。
df.to_csv('data.csv', mode='a', header=False)
is the command i'm using to save my progress.是我用来保存进度的命令。
Thanks for reading.谢谢阅读。
IIUC, you have a single dataframe to which you append to over time and which you want to back up periodically. IIUC,您有一个 dataframe 到 append 随着时间的推移,您想定期备份。
There are multiple approaches you could try:您可以尝试多种方法:
df.to_csv('data.csv', header=False) # or header=True
# (i) First time write the complete dataframe
df.to_csv('data.csv', header=False) # or header=True
# (ii) store the length of the dataframe at that point
lines_written = len(df.index)
# More data is being added to the dataframe from the web
# (iii) append new lines to CSV file
df.iloc[lines_written:].to_csv('data.csv', mode='a', header=False)
# (iv) update the line counter
lines_written = len(df.index)
# repeat steps (iii) and (iv)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.