简体   繁体   English

重新采样/重新索引传感器数据

[英]Resample/reindex sensor data

I want to do some data processing to sensor data (about 300 different sensors). 我想对传感器数据(大约300个不同的传感器)进行一些数据处理。 This is an example of the raw data from a temperature sensor: 这是来自温度传感器的原始数据的示例:

 "2018-06-30T13:17:05.986Z" 30.5
 "2018-06-30T13:12:05.984Z" 30.3
 "2018-06-30T13:07:05.934Z" 29.5
 "2018-06-30T13:02:05.873Z" 30.3
 "2018-06-30T12:57:05.904Z" 30

I want to resample the data to smooth datetimes: 我想对数据重新采样以平滑日期时间:

13:00:00
13:05:00
13:10:00
...

I have written some code that works, but is incredibly slow when used on bigger files. 我已经写了一些有效的代码,但是在较大的文件上使用时速度非常慢。 My code just upsamples all the data to 1 sec via linear interpolation. 我的代码只是通过线性插值将所有数据升采样到1秒。 and downsamples afterwards to the requested frequency. 然后向下采样到要求的频率。

Is there a faster method to achieve this? 有没有更快的方法来实现这一目标?

EDIT: sensor data is written into a database and my code loads data from an arbitrary time intervall from the database 编辑:传感器数据写入数据库,我的代码从数据库的任意时间间隔加载数据

EDIT2: My working code EDIT2:我的工作代码

upsampled = dataframe.resample('1S').asfreq()
upsampled = upsampled.interpolate(method=method, limit=limitT) # ffill or bfill for some sensors 
resampled = upsampled.astype(float).resample(str(sampling_time) + 'S').mean() # for temperature 
resampled = upsampled.astype(float).resample(str(sampling_time) + 'S').asfreq() # for everything else

You can first set the index for the dataframe as the column with timestamps, and then use resample() method to bring it to every 1sec or every 5min interval data. 您可以首先将数据帧的索引设置为带有时间戳的列,然后使用resample()方法将其移至每1秒或每5分钟间隔的数据一次。

For example: 例如:

temp_df = pd.read_csv('temp.csv',header=None)
temp_df.columns = ['Timestamps','TEMP']
temp_df = temp_df.set_index('Timestamps') #set the timestamp column as index
temp_re_df = temp_df.TEMP.resample('5T').mean()

You can set the period as argument to the resample() ie T - min , S - sec , M - month, H - hour etc. and also apply a function like mean() or max() or min() to consider the down-sampling method. 您可以将周期设置为resample()参数,即T-min,S-sec,M-month,H-hour等,还可以应用诸如mean()max()min()类的函数来考虑下采样方法。

PS : This is given that that your timestamp are in datetime format of pandas. PS:鉴于您的时间戳是熊猫的datetime格式。 Else use pd.to_datetime(temp_df['Timestamps'],unit='s') to convert to datetime index column 否则使用pd.to_datetime(temp_df['Timestamps'],unit='s')转换为datetime索引列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM