简体   繁体   English

复合列和Cassandra中的“ IN”关系

[英]Composite columns and “IN” relation in Cassandra

I have the following column family in Cassandra for storing time series data in a small number of very "wide" rows: 我在Cassandra中具有以下列族,用于在少量非常宽的行中存储时间序列数据:

CREATE TABLE data_bucket (
  day_of_year int,
  minute_of_day int,
  event_id int,
  data ascii,
  PRIMARY KEY (data_of_year, minute_of_day, event_id)
)

On the CQL shell, I am able to run a query such as this: 在CQL Shell上,我可以运行如下查询:

select * from data_bucket where day_of_year = 266 and minute_of_day = 244 
  and event_id in (4, 7, 11, 1990, 3433)

Essentially, I fix the value of the first component of the composite column name (minute_of_day) and want to select a non-contiguous set of columns based on the distinct values of the second component (event_id). 本质上,我固定了复合列名称的第一个组件的值(minute_of_day),并希望基于第二个组件的不同值(event_id)选择一组非连续的列。 Since the "IN" relation is interpreted as an equality relation, this works fine. 由于“ IN”关系被解释为相等关系,因此可以正常工作。

Now my question is, how would I accomplish the same type of composite column slicing programmatically and without CQL. 现在我的问题是,如何在没有CQL的情况下以编程方式完成相同类型的复合列切片。 So far I have tried the Python client pycassa and the Java client Astyanax, but without any success. 到目前为止,我已经尝试了Python客户端pycassa和Java客户端Astyanax,但是没有成功。

Any thoughts would be welcome. 任何想法都将受到欢迎。

EDIT: 编辑:

I'm adding the describe output of the column family as seen through cassandra-cli. 我要添加通过cassandra-cli看到的列族的describe输出。 Since I am looking for a Thrift-based solution, maybe this will help. 由于我正在寻找基于Thrift的解决方案,因此这可能会有所帮助。

ColumnFamily: data_bucket
  Key Validation Class: org.apache.cassandra.db.marshal.Int32Type
  Default column value validator: org.apache.cassandra.db.marshal.AsciiType
  Cells sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Populate IO Cache on flush: false
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
    sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

There is no "IN"-type query in the Thrift API. Thrift API中没有“ IN”类型的查询。 You could perform a series of get queries for each composite column value ( day_of_year , minute_of_day , event_id ). 您可以对每个复合列值( day_of_yearminute_of_dayevent_id )执行一系列get查询。

If your event_id s were sequential (and your question says they are not) you could perform a single get_slice query, passing in the range (eg, day_of_year , minute_of_day , and range of event_id s). 如果您的event_id是顺序的(并且您的问题回答不是),则可以执行单个get_slice查询,并传入范围(例如, day_of_yearminute_of_dayevent_id的范围)。 You could grab bunches of them in this way and filter the response programatically yourself (eg, grab all data on the date with event ids between 4-3433). 您可以通过这种方式获取大量信息,然后自己以编程方式过滤响应(例如,获取日期为4-3433之间的事件ID的所有数据)。 More data transfer, more processing on the client side so not a great option unless you really are looking for a range. 除非您确实在寻找范围,否则更多的数据传输,客户端的更多处理因此不是一个好选择。

So, if you want to use "IN" with Cassandra you will need to switch to a CQL-based solution. 因此,如果要在Cassandra中使用“ IN”,则需要切换到基于CQL的解决方案。 If you are considering using CQL in python another option is cassandra-dbapi2 . 如果您正在考虑在python中使用CQL,则另一个选择是cassandra-dbapi2 This worked for me: 这为我工作:

import cql

# Replace settings as appropriate
host = 'localhost'
port = 9160
keyspace = 'keyspace_name'

# Connect
connection = cql.connect(host, port, keyspace, cql_version='3.0.1')
cursor = connection.cursor()
print "connected!"

# Execute CQL
cursor.execute("select * from data_bucket where day_of_year = 266 and minute_of_day = 244 and event_id in (4, 7, 11, 1990, 3433)")
for row in cursor:
  print str(row) # Do something with your data

# Shut the connection
cursor.close()
connection.close()

(Tested with Cassandra 2.0.1.) (使用Cassandra 2.0.1测试)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM