简体   繁体   中英

pandas dataframe time series drop duplicates

I am trying to update temperature time series by combining 2 CSV files that may have duplicate rows at times.

I have tried to implement drop_duplicates but it's not working for me.

Here is an example of what I'm trying to do:

import pandas as pd
import numpy as np

from pandas import DataFrame, Series


dfA = DataFrame({'date' : Series(['1/1/10','1/2/10','1/3/10','1/4/10'], index=[0,1,2,3]),
    'a' : Series([60,57,56,50], index=[0,1,2,3]),
    'b' : Series([80,73,76,56], index=[0,1,2,3])})

print("dfA")     
print(dfA)

dfB = DataFrame({'date' : Series(['1/3/10','1/4/10','1/5/10','1/6/10'], index=[0,1,2,3]),
    'a' : Series([56,50,59,75], index=[0,1,2,3]),
    'b' : Series([76,56,73,89], index=[0,1,2,3])})

print("dfB")
print(dfB)

dfC = dfA.append(dfB)

print(dfC.duplicated())

dfC.drop_duplicates()
print("dfC")
print(dfC)

And this is the output:

dfA
    a   b    date
0  60  80  1/1/10
1  57  73  1/2/10
2  56  76  1/3/10
3  50  56  1/4/10
dfB
    a   b    date
0  56  76  1/3/10
1  50  56  1/4/10
2  59  73  1/5/10
3  75  89  1/6/10
0    False
1    False
2    False
3    False
0     True
1     True
2    False
3    False
dtype: bool
dfC
    a   b    date
0  60  80  1/1/10
1  57  73  1/2/10
2  56  76  1/3/10
3  50  56  1/4/10
0  56  76  1/3/10
1  50  56  1/4/10
2  59  73  1/5/10
3  75  89  1/6/10

How do I update a time series with overlapping data and not have duplicates?

The line dfC.drop_duplicates() does not actually change the DataFrame that dfC is bound to (it just returns a copy of it with no duplicate rows).

You can either specify that the DataFrame dfC is modified inplace by passing in the inplace keyword argument,

dfC.drop_duplicates(inplace=True)

or rebind the view of the de-duplicated DataFrame to the name dfC like this

dfC = dfC.drop_duplicates()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM