简体   繁体   English

Cassandra:时间序列数据时间戳的范围查询

[英]Cassandra: Range Queries on timestamp of time series data

I am trying to evaluate Cassandra DB performance for storing and retrieving time series data of different channels. 我正在尝试评估Cassandra DB在存储和检索不同通道的时间序列数据方面的性能。

The data is recorded with with maximum record rate of 8 sample/sec in a file format along with a timestamp in millisecond for each sample. 数据以文件格式以8个样本/秒的最大记录速率记录,每个样本的时间戳以毫秒为单位。 The number of channels recording for a given time may vary. 给定时间记录的频道数可能会有所不同。

Inspired from the following link Getting Started with Time Series Data Modeling , I used created the following table: 从下面的链接中汲取灵感,我使用了下表创建时间表数据建模入门

CREATE TABLE uhhdata ( ch_idx int, date timestamp, dt timestamp, val float, PRIMARY KEY ((ch_idx, date), dt) ); 创建表uhhdata(ch_idx int,date timestamp,dt timestamp,val float,PRIMARY KEY((ch_idx,date),dt));

where the Partition key is composed of channel number (ch_idx int) and date timestamp which stores the date not and not timestamp detail and dt is the timestamp of record with less than second resolution. 其中分区键由通道号(ch_idx int)和日期时间戳组成,日期时间戳存储日期not和非时间戳详细信息,而dt是记录时间戳,其分辨率小于第二个分辨率。

I have two problems: 1-after writing 2,500,000 record into this table and running a query select * from UHHdata limit 10,000,000; 我有两个问题:1-在将2,500,000条记录写入该表并运行查询后,从UHHdata限制10,000,000中选择*; I got the following time out error: 我收到以下超时错误:

Request did not complete within rpc_timeout. 请求未在rpc_timeout内完成。

C++ driver simply returns NULL for this number for this number of record: boost::shared_ptr result = future.get().result; C ++驱动程序为此记录数简单地为此数返回NULL:boost :: shared_ptr result = future.get()。result;

if(!result) std::cout << "No result record\\n"; if(!result)std :: cout <<“没有结果记录\\ n”;

If do this for 100,000, it returns after 22 seconds. 如果执行100,000次,则22秒后返回。 How can I retrieve all the records for big queries like this? 我如何检索像这样的大查询的所有记录? I have seen a post cassandra get all records in time range , however, I do not how does apply to my case as I need to get all records not some of them? 我已经看到cassandra帖子会获取该时间范围内的所有记录 ,但是,由于我需要获取所有记录而不是其中的一些记录,因此我不适用于我的情况吗?

2-If do a range query on dt timstamp as follows, the returned queries does not check the interval specified by the interval and it is irrespective of lower and upper time limit: 2-如果按以下方式对dt timstamp进行范围查询,则返回的查询将不检查由该间隔指定的间隔,并且与上下限无关:

As can be observed, the query returns record bigger than upper time limit '2014-04-04 01:00:10': 可以看出,查询返回的记录大于上限时间“ 2014-04-04 01:00:10”:

cqlsh:uhhkeyspace2> select * from UHHData where ch_idx=1 AND date = '2012-04-04 01:00:00' AND dt < '2014-04-04 01:00:10' LIMIT 20; cqlsh:uhhkeyspace2>从UHHData选择*其中ch_idx = 1 AND date ='2012-04-04 01:00:00'AND dt <'2014-04-04 01:00:10'LIMIT 20;

ch_idx | ch_idx | date | 日期| dt | dt | val VAL

--------+--------------------------------------+--------------------------------------+----- -------- + -------------------------------------- + - ------------------------------------ + -----

  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:00GMT Daylight Time |  -5
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:01GMT Daylight Time |  44
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:02GMT Daylight Time |  83
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:03GMT Daylight Time |  99
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:04GMT Daylight Time |  89
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:05GMT Daylight Time |  55
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:06GMT Daylight Time |   5
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:07GMT Daylight Time | -44
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:08GMT Daylight Time | -83
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:09GMT Daylight Time | -99
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:10GMT Daylight Time | -89
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:11GMT Daylight Time | -55
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:12GMT Daylight Time |  -5
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:13GMT Daylight Time |  44
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:14GMT Daylight Time |  83
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:15GMT Daylight Time |  99
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:16GMT Daylight Time |  89
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:17GMT Daylight Time |  55
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:18GMT Daylight Time |   5
  1 | 2012-04-04 01:00:00GMT Daylight Time | 2012-04-04 01:00:19GMT Daylight Time | -44

(20 rows) (20列)

Why the timestamp limit conditions are not applied? 为什么不应用时间戳记限制条件? How Can I fix this? 我怎样才能解决这个问题?

Thanks, Amin 谢谢阿敏

I don't see any problems. 我没看到任何问题。 All your timestamps in dt column are from 2012-04-04 and your condition is dt < '2014-04-04 01:00:10' . 您在dt列中的所有时间戳都是从2012-04-04 ,您的条件是dt < '2014-04-04 01:00:10' 2012 is before 2014, so everything is correct 2012年是2014年之前,所以一切正确

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM