简体   繁体   中英

Python Dataframe prevent duplicates while concating

I have two dataframes. I concat them to make one. The problem is, while troubleshooting the code, I will the same concat code multiple times. This produces the dataframe with repeated rows as many times I do the concat. I want to prevent it.

My code:

rdf = pd.DataFrame({'A':[10,20]},index=pd.date_range(start='2020-05-04 08:00:00', freq='1h', periods=2))
df2 = pd.DataFrame({'A':[30,40]},index=pd.date_range(start='2020-05-04 10:00:00', freq='1h', periods=2))

# Run it first time
rdf= pd.concat([rdf,df2])
# First time result
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

# Run it second time
rdf= pd.concat([rdf,df2])
# second time result produces duplicates
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

My solution: My approach is right a new line code and drop duplicates by keeping the first.

rdf= pd.concat([rdf,df2])
rdf.drop_duplicates(keep='first',inplace=True)
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

Is there a better approach? I mean, can we prevent this while concating? so, no need to write extra line code for dropping the duplicates.

Then let us try combine_first

rdf = rdf.combine_first(df2)
rdf = rdf.combine_first(df2)
rdf
Out[115]: 
                        A
2020-05-04 08:00:00  10.0
2020-05-04 09:00:00  20.0
2020-05-04 10:00:00  30.0
2020-05-04 11:00:00  40.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM