简体繁体中英

How to handle collection and analysis of arbitrary timeseries data (data stream mining)

原文 2016-11-16 10:13:12 8 1 python/ pandas/ time-series/ analytics/ data-mining

At our hackerspace, we have several environmental sensors and event trackers (such as # connected devices, heating, bar transactions, etc.) that output timeseries data at regular intervals. The output of our current platform consists of a unix timestamp + value/event. The intervals at which these are polled are different for each probe.

The goal is to collect this data in one dataset for

efficient storage
online analysis (using scikit)
streaming visualization (using bokeh)
handle both real-values and discrete numeric data in a integrated manner
(preferably using Python but this is not a requirement.)

What is a good practical approach the achieve the above goals? Are there existing libraries that provide this functionality?

The current (imperfect) plan:

Integrate timeseries object and integrate them in a numpy array or pandas timeseries dataframe.
Update x-axis by the smallest available time interval and set missing datapoints to NaN for sensors with a larger interval.
NaN values can later be interpolated/convolved.

However, this would result in a dataset with a majority of NaN values and that comes with its own statistical and possibly storage problems. Another option is to predetermine an median interval and store that losing some data.

1 answers

Time-series databases have shown to be the correct answer after some further searching. I plan on using OpenTSDB as it seems the most developed out of the available timeseries databases.

This solves the storage and interval querying issues as these are built into the database management system. Then it is just a matter of visualization with Bokeh.

How to handle high frequency timeseries data?

TimeSeries analysis & forcasting with highfrequence data (1 or 5min data)

look ahead time analysis in R (data mining algorithm)

How to type hint an (arbitrary) collection of some data type in Python?

Create a dataframe for stock analysis using a datetimeindex timeseries data source

How Normalize Data Mining in Python with library

How to process YAML file ? (Data mining)

Stock data mining for a company

can pandas autocorr handle irregularly sample timeseries data?

Storing text mining data

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to handle high frequency timeseries data? TimeSeries analysis & forcasting with highfrequence data (1 or 5min data) look ahead time analysis in R (data mining algorithm) How to type hint an (arbitrary) collection of some data type in Python? Create a dataframe for stock analysis using a datetimeindex timeseries data source How Normalize Data Mining in Python with library How to process YAML file ? (Data mining) Stock data mining for a company can pandas autocorr handle irregularly sample timeseries data? Storing text mining data

Related Tags

How to handle collection and analysis of arbitrary timeseries data (data stream mining)

Question

1 answers

solution1 0 ACCPTED 2016-12-05 12:21:24

solution1
0 ACCPTED 2016-12-05 12:21:24