简体   繁体   中英

Subsample multi-sensor time series data using Python's panda.Dataframe

I have a file coming from multiple sensor readings. Each line is of the following format:

timestamp sensor_name sensor_value

eg

191.12 temperature -5.19
191.17 pressure 20.05
191.18 pressure 20.04
191.23 pressure 20.07
191.23 temperature -5.17
191.31 temperature -5.09
...

The frequency of the readings is irregular, approximately 10-20Hz. I need do downsample these readings to 1Hz and output the result in the following format

timestamp sensor_1_value sensor_2_value ... sensor_n_value

reflecting the (running?) mean value of the sensor readings in the successive seconds, eg

timestamp temperature pressure
191.00 -5.02 21.93
192.00 -5.01 21.92
193.00 -5.01 21.91
...

I loaded each line of the input file into a dictionary as follows:

   def add(self, timestamp, sensor_name, sensor_value):
     self.timeseries[sensor_name].append([timestamp, sensor_value]) 

... and created a DataFrame from the dictionary:

df = pd.DataFrame(self.timeseries)

... but I need some guidance how to move forward from here, ie what's an elegant way to perform the sampling.

I'm not 100% sure what you're doing but this is what I'd do to solve the problem. It assumes your data file is space-separated with a header row.

import pandas as pd
import numpy as np

# Load the data
data = pd.read_csv(file_name, sep="\s", index_col=None)

# Take the mean of the values within a second
data = np.floor(data["timestamp"])
data = data.groupby(["timestamp", "sensor_name"]).mean()
data = data.reset_index()

# Pivot
data = data.pivot(index="timestamp", columns="sensor_name", values="sensor_value")

If you have some other concept for "downsampling" in mind for this context you should do that instead of mean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM