Cassandra CQL中的字符串排序

Question

在Cassandra CQL中查詢文本主鍵時，字符串比較的工作方式與預期的相反，即

cqlsh:test> select * from sl;

 name                     | data
--------------------------+------
 000000020000000000000003 | null
 000000010000000000000005 | null
 000000010000000000000003 | null
 000000010000000000000002 | null
 000000010000000000000001 | null

cqlsh:test> select name from sl where token(name) < token('000000010000000000000005');
name
--------------------------
 000000020000000000000003

(1 rows)

cqlsh:test> select name from sl where token(name) > token('000000010000000000000005');
 name
--------------------------
 000000010000000000000003
 000000010000000000000002
 000000010000000000000001

(3 rows)

相反，這是我從Python（和我認為在大多數其他語言中）的字符串比較中得到的：

>>>'000000020000000000000003' < '000000010000000000000005'
False

如果查詢不帶令牌功能，則會出現以下錯誤：

cqlsh:test> select name from sl where name < '000000010000000000000005';
Bad Request: Only EQ and IN relation are supported on the partition key (unless you use the token() function)

表說明為：

CREATE TABLE sl (
  name text,
  data blob,
  PRIMARY KEY (name)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

在我錯過的文檔中或其他地方是否有關於為什么選擇這樣一個奇怪的字符串比較順序的解釋，或者字符串比較運算符沒有我期望的解釋（即返回一些無關的順序，即寫入數據庫的行）。 我正在使用Murmur3Partitioner分區程序，以防萬一。

Answer 1

這里是一些有關令牌功能和相關分頁的文檔鏈接。 抱歉，主題廣泛。 我不知道哪些可能會有所幫助：

http://www.datastax.com/documentation/cql/3.1/cql/cql_using/paging_c.html在無序分區結果中進行分頁意味着使用Murmur3Partitioner確實很重要。
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results部分說，使用RandomPartitioner進行分頁不會給您有意義的結果。 在這種情況下，RandomPartitioner與Murmer3Partitioner是同義的。 文檔應同時提及兩者。
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0請參閱自動分頁。
http://datastax.github.io/python-driver/query_paging.html
http://www.datastax.com/drivers/java/2.0/index.html參閱ResultSet。

Answer 2

在Cassandra中，行按其鍵值的哈希值排序。 使用Random和Murmur3分區程序，哈希值有一個隨機元素，因此順序是A）沒有意義，B）旨在均勻分布在整個環上。

因此，查詢小於token('000000010000000000000005')不會基於字符串值“ 000000010000000000000005”進行比較。 它將對散列令牌值進行比較。 根據所看到的結果，字符串“ 000000020000000000000003”的標記值小於“ 000000010000000000000005”的標記值。

有關更多信息，請查閱DataStax中的此文檔：通過無序分區結果分頁。

假設您希望能夠通過“名稱”的值查詢數據，則可以構建一個像這樣的表：

CREATE TABLE sl (
  type text,
  name text,
  data blob,
  PRIMARY KEY (type, name)
)

我已經創建了type作為分區鍵。 我不確定將您的數據划分為“類型”（或與此相關的任何其他內容）是否有意義，因此，為示例起見，這比其他任何事情都重要。 無論如何，以name作為聚類鍵（確定磁盤上的排序順序），此查詢將起作用：

select * from sl where type='sometype' AND name < '000000010000000000000005';

同樣，這只是一個例子，但我希望這可以幫助您指出正確的方向。

Cassandra CQL中的字符串排序

問題描述

2 個解決方案

解決方案1
3 2014-09-30 14:44:14

解決方案2
3 已采納 2014-09-30 14:51:54

Cassandra CQL中的字符串排序

問題描述

2 個解決方案

解決方案1 3 2014-09-30 14:44:14

解決方案2 3 已采納 2014-09-30 14:51:54

解決方案1
3 2014-09-30 14:44:14

解決方案2
3 已采納 2014-09-30 14:51:54