I have two dataframes. I concat them to make one. The problem is, while troubleshooting the code, I will the same concat code multiple times. This produces the dataframe with repeated rows as many times I do the concat. I want to prevent it.
My code:
rdf = pd.DataFrame({'A':[10,20]},index=pd.date_range(start='2020-05-04 08:00:00', freq='1h', periods=2))
df2 = pd.DataFrame({'A':[30,40]},index=pd.date_range(start='2020-05-04 10:00:00', freq='1h', periods=2))
# Run it first time
rdf= pd.concat([rdf,df2])
# First time result
rdf
A
2020-05-04 08:00:00 10
2020-05-04 09:00:00 20
2020-05-04 10:00:00 30
2020-05-04 11:00:00 40
# Run it second time
rdf= pd.concat([rdf,df2])
# second time result produces duplicates
rdf
A
2020-05-04 08:00:00 10
2020-05-04 09:00:00 20
2020-05-04 10:00:00 30
2020-05-04 11:00:00 40
2020-05-04 10:00:00 30
2020-05-04 11:00:00 40
My solution: My approach is right a new line code and drop duplicates by keeping the first.
rdf= pd.concat([rdf,df2])
rdf.drop_duplicates(keep='first',inplace=True)
rdf
A
2020-05-04 08:00:00 10
2020-05-04 09:00:00 20
2020-05-04 10:00:00 30
2020-05-04 11:00:00 40
Is there a better approach? I mean, can we prevent this while concating? so, no need to write extra line code for dropping the duplicates.
Then let us try combine_first
rdf = rdf.combine_first(df2)
rdf = rdf.combine_first(df2)
rdf
Out[115]:
A
2020-05-04 08:00:00 10.0
2020-05-04 09:00:00 20.0
2020-05-04 10:00:00 30.0
2020-05-04 11:00:00 40.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.