简体   繁体   中英

Merging/Updating data sets in Python or Pandas

I just started learning python to help me work more efficiently with data files at work instead of just using excel. One issue I constantly have is I often need to download csv files from our database and start cleaning, coding, and formatting the data. However, often there is data missing or data gets updated in the database so then I need to update my data that I already started cleaning in excel. I usually have to copy and paste the new data into my excel file and as you can guess that is not very efficient. Is there a more time effective way to update my original data with the new data using python or pandas? Thanks for the help!

If you are using pandas, what you can do is create a new pandas.DataFrame with the new data, process it the same is with the previous data and then merge both using pandas.concat() to join them just one below the other or pandas.merge() if you want to join them by an specific column to update rows that were present previously.

pandas.concat doc . Example:

pd.concat([df1, df2])

pandas.merge doc . Example:

df1.merge(df2, how='outer', on='<identifier>')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM