简体   繁体   English

熊猫数据帧时间序列掉落重复项

[英]pandas dataframe time series drop duplicates

I am trying to update temperature time series by combining 2 CSV files that may have duplicate rows at times. 我正在尝试通过组合2个CSV文件来更新温度时间序列,这些文件有时可能有重复的行。

I have tried to implement drop_duplicates but it's not working for me. 我尝试实现drop_duplicates但对我来说不起作用。

Here is an example of what I'm trying to do: 这是我要执行的操作的一个示例:

import pandas as pd
import numpy as np

from pandas import DataFrame, Series


dfA = DataFrame({'date' : Series(['1/1/10','1/2/10','1/3/10','1/4/10'], index=[0,1,2,3]),
    'a' : Series([60,57,56,50], index=[0,1,2,3]),
    'b' : Series([80,73,76,56], index=[0,1,2,3])})

print("dfA")     
print(dfA)

dfB = DataFrame({'date' : Series(['1/3/10','1/4/10','1/5/10','1/6/10'], index=[0,1,2,3]),
    'a' : Series([56,50,59,75], index=[0,1,2,3]),
    'b' : Series([76,56,73,89], index=[0,1,2,3])})

print("dfB")
print(dfB)

dfC = dfA.append(dfB)

print(dfC.duplicated())

dfC.drop_duplicates()
print("dfC")
print(dfC)

And this is the output: 这是输出:

dfA
    a   b    date
0  60  80  1/1/10
1  57  73  1/2/10
2  56  76  1/3/10
3  50  56  1/4/10
dfB
    a   b    date
0  56  76  1/3/10
1  50  56  1/4/10
2  59  73  1/5/10
3  75  89  1/6/10
0    False
1    False
2    False
3    False
0     True
1     True
2    False
3    False
dtype: bool
dfC
    a   b    date
0  60  80  1/1/10
1  57  73  1/2/10
2  56  76  1/3/10
3  50  56  1/4/10
0  56  76  1/3/10
1  50  56  1/4/10
2  59  73  1/5/10
3  75  89  1/6/10

How do I update a time series with overlapping data and not have duplicates? 如何更新具有重叠数据且没有重复项的时间序列?

The line dfC.drop_duplicates() does not actually change the DataFrame that dfC is bound to (it just returns a copy of it with no duplicate rows). dfC.drop_duplicates()行实际上并未更改dfC绑定到的dfC (它只是返回其副本,没有重复的行)。

You can either specify that the DataFrame dfC is modified inplace by passing in the inplace keyword argument, 您可以指定数据帧dfC是通过传递修改就地inplace关键字参数,

dfC.drop_duplicates(inplace=True)

or rebind the view of the de-duplicated DataFrame to the name dfC like this 或将经过重复数据删除的DataFrame的视图重新绑定到名称dfC如下所示

dfC = dfC.drop_duplicates()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM