简体   繁体   English

如何在不添加重复项的情况下使用 pandas 更新 CSV 文件

[英]How do I update a CSV file with pandas without adding duplicates

I'm trying to get some data off the web and it's taking a while.我正在尝试从 web 中获取一些数据,这需要一段时间。 In case anything happens I've been periodically saving the data in a csv file.万一发生任何事情,我会定期将数据保存在 csv 文件中。

However, it just appends a new copy of the dataframe to the CSV file.但是,它只是将 dataframe 的新副本附加到 CSV 文件中。 This means that there's loads of duplicates in the file.这意味着文件中有大量重复项。

df.to_csv('data.csv', mode='a', header=False)

is the command i'm using to save my progress.是我用来保存进度的命令。

Thanks for reading.谢谢阅读。

IIUC, you have a single dataframe to which you append to over time and which you want to back up periodically. IIUC,您有一个 dataframe 到 append 随着时间的推移,您想定期备份。

There are multiple approaches you could try:您可以尝试多种方法:

  1. If writing the file is fast, instead of appending, just write the complete dataframe every time (writing the header potentially could be useful in this case):如果写入文件很快,而不是追加,只需每次写入完整的 dataframe (写入 header 在这种情况下可能有用):
df.to_csv('data.csv', header=False)  # or header=True
  1. Keep track of which lines you have already written and only append new lines:跟踪您已经编写了哪些行,并且只有 append 新行:
# (i) First time write the complete dataframe
df.to_csv('data.csv', header=False)  # or header=True

# (ii) store the length of the dataframe at that point
lines_written = len(df.index)

# More data is being added to the dataframe from the web

# (iii) append new lines to CSV file
df.iloc[lines_written:].to_csv('data.csv', mode='a', header=False)

# (iv) update the line counter
lines_written = len(df.index)

# repeat steps (iii) and (iv)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在没有重复的情况下更新da Pandas Panel - How to update da Pandas Panel without duplicates 我需要帮助将 1 个 csv 文件和 1 个 pandas 数据框连接在一起而没有重复 - I need help concatenating 1 csv file and 1 pandas dataframe together without duplicates 如何在不修改第一行的情况下使用 Pandas 将 excel 文件转换为 csv 文件? - How do I convert excel file into csv file using pandas without the first row being modified? 如何使用 Z3A43B4F88325D94022C0EFA 库在 python 的 2 列 CSV 文件上更改 header 而不创建新的 C9 文件? - How do I change the header on a 2 column CSV file in python using the pandas library without creating a new file? 我如何发送带有 python pandas 的 emsil csv 附件而不使用 to_csv 文件导出到 Z628CB5675FF524F3E719B7AA2E - how do i send emsil csv attachment with python pandas without exporting to csv file ithut using to_csv 如何在不使用 Pandas 的情况下对 CSV 文件进行操作? - How to do operations of a CSV file without using Pandas? 在不使用熊猫的情况下,如何分析CSV数据并仅从CSV文件的某些列和行中提取某些值? - WITHOUT using Pandas, how do I analyze CSV data and only extract certain values from certain columns and rows of my CSV file? 如何使用 Pandas 从 CSV 文件中删除两个重复项? - how to delete BOTH duplicates from a CSV file using Pandas? 如何使用python中的pandas从csv文件读取? - How do I read from a csv file using pandas in python? 如何在 Streamlit 中将 Pandas DataFrame 下载到 CSV 文件 - How do I download a Pandas DataFrame to a CSV File in Streamlit
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM