简体   繁体   English

尝试对熊猫进行升采样以获取每分钟的数据

[英]Trying to upsample Pandas to have data for every minute

I have a database, with a mix of minutely,5minutely, and hourly data points: My goal is to have 10min data averages, but hwen I plot this out there is missing datapoints and the written CSV file goes from writing every data point to every hour) 我有一个数据库,其中包含分钟,5分钟和每小时数据点的混合:我的目标是平均10分钟数据,但是我将其绘制出缺少的数据点,并且写入的CSV文件从写入每个数据点到每个小时)

The output looks like 输出看起来像

    2005-03-01 17:00:00,3.25
    2005-03-01 17:10:00,-5.75
    2005-03-01 17:20:00,-6.0
    2005-03-01 17:30:00,
    2005-03-01 17:40:00,
    2005-03-01 17:50:00,
    2005-03-01 18:00:00,2.3
    2005-03-01 18:10:00,
    2005-03-01 18:20:00,
    2005-03-01 18:30:00,
    2005-03-01 18:40:00,
    2005-03-01 18:50:00,
    2005-03-01 19:00:00,2.8

The original input, looks like: 原始输入如下所示:

  01-mar-05 17:10,   1.6,  7.9, 0.0214, 1.3536, 0.0214, 1.6726, 1.00,30.567
  01-mar-05 17:15, -13.1,  7.9, 0.0214, 1.3540, 0.0214, 1.6729, 1.00,30.550
  01-mar-05 17:20,   3.2,  7.9, 0.0214, 1.3542, 0.0214, 1.6731, 1.00,30.554
  01-mar-05 17:25, -15.2,  7.9, 0.0214, 1.3544, 0.0214, 1.6731, 1.00,30.534
  01-mar-05 18:00,   2.3,  8.0, 0.0214, 1.8276, 0.0214, 1.6932, 1.00, 0.034
  01-mar-05 19:00,   2.8,  8.0, 0.0214, 1.8312, 0.0214, 1.6973, 1.00, 0.081
  01-mar-05 20:00,   6.8,  8.0, 0.0214, 1.8313, 0.0214, 1.6993, 1.00,  .192

The code that I used was: 我使用的代码是:

    names= ['Date','Conc','Flow','SZ','SB','RZ','RB','Fraction','Attenuation']
    df = pd.read_csv('Output13.csv', index_col=0, names=names, parse_dates=True)
    df1 = df[['Conc']].resample('10min').mean()

And I tried 我试过了

   df=df.resample('1min',fill_method='bfill') 

thinking that that would fill in all the data points in the original file... but it didn't work. 认为这将填充原始文件中的所有数据点...但是没有用。

Any suggestions? 有什么建议么? Thanks! 谢谢!

is that what you want? 那是你要的吗?

In [57]: df.resample('T').ffill()
Out[57]:
                     Conc  Flow      SZ      SB      RZ      RB  Fraction  Attenuation
Date
2005-03-01 17:10:00   1.6   7.9  0.0214  1.3536  0.0214  1.6726       1.0       30.567
2005-03-01 17:11:00   1.6   7.9  0.0214  1.3536  0.0214  1.6726       1.0       30.567
2005-03-01 17:12:00   1.6   7.9  0.0214  1.3536  0.0214  1.6726       1.0       30.567
2005-03-01 17:13:00   1.6   7.9  0.0214  1.3536  0.0214  1.6726       1.0       30.567
2005-03-01 17:14:00   1.6   7.9  0.0214  1.3536  0.0214  1.6726       1.0       30.567
2005-03-01 17:15:00 -13.1   7.9  0.0214  1.3540  0.0214  1.6729       1.0       30.550
2005-03-01 17:16:00 -13.1   7.9  0.0214  1.3540  0.0214  1.6729       1.0       30.550
2005-03-01 17:17:00 -13.1   7.9  0.0214  1.3540  0.0214  1.6729       1.0       30.550
2005-03-01 17:18:00 -13.1   7.9  0.0214  1.3540  0.0214  1.6729       1.0       30.550
2005-03-01 17:19:00 -13.1   7.9  0.0214  1.3540  0.0214  1.6729       1.0       30.550
2005-03-01 17:20:00   3.2   7.9  0.0214  1.3542  0.0214  1.6731       1.0       30.554
2005-03-01 17:21:00   3.2   7.9  0.0214  1.3542  0.0214  1.6731       1.0       30.554
2005-03-01 17:22:00   3.2   7.9  0.0214  1.3542  0.0214  1.6731       1.0       30.554
2005-03-01 17:23:00   3.2   7.9  0.0214  1.3542  0.0214  1.6731       1.0       30.554
2005-03-01 17:24:00   3.2   7.9  0.0214  1.3542  0.0214  1.6731       1.0       30.554
2005-03-01 17:25:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:26:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:27:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:28:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:29:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:30:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:31:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:32:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:33:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:34:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:35:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:36:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:37:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:38:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534
2005-03-01 17:39:00 -15.2   7.9  0.0214  1.3544  0.0214  1.6731       1.0       30.534

Your data is a little sparse to get points every 10min when you only get one reading per hour at the end... Since you get missing points, you either have to use the data you have (using ffill or bfill ), or interpolate the missing data. 当您每小时只能读取一小时的读数时,您的数据每隔10分钟就会有点稀疏...由于缺少点数,您要么必须使用已有的数据(使用ffillbfill ),要么对数据进行插值缺失数据。

df['Conc'].plot(label='original')
df['Conc'].resample('10T').ffill().plot(label='ffill')
df['Conc'].resample('10T').bfill().plot(label='bfill')
df['Conc'].resample('10T').mean().interpolate(method='linear').plot(label='linear interpolation')
df['Conc'].resample('10T').mean().interpolate(method='cubic').plot(label='cubic interpolation')
plt.legend(loc=4)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM