简体   繁体   English

合并/更新 Python 或 Pandas 中的数据集

[英]Merging/Updating data sets in Python or Pandas

I just started learning python to help me work more efficiently with data files at work instead of just using excel. One issue I constantly have is I often need to download csv files from our database and start cleaning, coding, and formatting the data.我刚刚开始学习 python 以帮助我在工作中更有效地处理数据文件,而不是仅仅使用 excel。我经常遇到的一个问题是我经常需要从我们的数据库下载 csv 文件并开始清理、编码和格式化数据。 However, often there is data missing or data gets updated in the database so then I need to update my data that I already started cleaning in excel. I usually have to copy and paste the new data into my excel file and as you can guess that is not very efficient.但是,数据库中经常会丢失数据或更新数据,因此我需要更新我已经在 excel 开始清理的数据。我通常必须将新数据复制并粘贴到我的 excel 文件中,正如您所猜到的效率不是很高。 Is there a more time effective way to update my original data with the new data using python or pandas?使用 python 或 pandas 使用新数据更新我的原始数据是否有更省时的方法? Thanks for the help!谢谢您的帮助!

If you are using pandas, what you can do is create a new pandas.DataFrame with the new data, process it the same is with the previous data and then merge both using pandas.concat() to join them just one below the other or pandas.merge() if you want to join them by an specific column to update rows that were present previously.如果您使用的是 pandas,您可以做的是使用新数据创建一个新的 pandas.DataFrame,以与先前数据相同的方式处理它,然后使用pandas.concat()将它们合并,将它们合并到另一个下方或pandas.merge()如果您想按特定列加入它们以更新以前存在的行。

pandas.concat doc . pandas.concat 文档 Example:例子:

pd.concat([df1, df2])

pandas.merge doc . pandas.合并文档 Example:例子:

df1.merge(df2, how='outer', on='<identifier>')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM