简体   繁体   English

优化 Cassandra 查询性能

[英]Optimize Cassandra query performance

I am using Cassandra to store 100M entries of data, and am trying to optimize the read and write queries.我正在使用 Cassandra 来存储 100M 的数据条目,并且正在尝试优化读写查询。 Currently, the schema looks like this:目前,架构如下所示:

DROP KEYSPACE IF EXISTS reviews_db;

CREATE KEYSPACE reviews_db WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};

USE reviews_db;

CREATE TABLE reviews(
id INT,
houseId INT, 
name TEXT,
picture TEXT,
reviewText TEXT,
reviewDate TEXT,
accuracyRating INT,
locationRating INT,
communicationRating INT,
checkinRating INT,
cleanlinessRating INT,
valueRating INT,
overallRating DECIMAL,
PRIMARY KEY(id, houseId)
);

CREATE INDEX ON reviews (houseId);

COPY reviews (id, houseId, name, picture, reviewText, reviewDate, accuracyRating, locationRating, communicationRating, checkinRating, cleanlinessRating, valueRating, overallRating) FROM './database/data/reviews1.csv' WITH DELIMITER=',' AND HEADER=FALSE;

When I run the query select id,houseid from reviews where houseid = 9999954;当我运行查询select id,houseid from reviews where houseid = 9999954;

the trace looks like this:跟踪看起来像这样:

Tracing session: 36fc1b20-a011-11e8-ac04-9109b2e8334a

activity                                                                                                                               | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                Execute CQL3 query | 2018-08-14 15:27:23.218000 | 127.0.0.1 |              0 | 127.0.0.1
                                     Parsing select id,houseid from reviews where houseid = 9999954; [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            253 | 127.0.0.1
                                                                                 Preparing statement [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            448 | 127.0.0.1
              Index mean cardinalities are reviews_houseid_idx:1. Scanning with reviews_houseid_idx. [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |            968 | 127.0.0.1
                                                                           Computing ranges to query [Native-Transport-Requests-1] | 2018-08-14 15:27:23.219000 | 127.0.0.1 |           1073 | 127.0.0.1       
Submitting range requests on 257 ranges with a concurrency of 257 (0.003515625 rows per range expected) [Native-Transport-Requests-1] | 2018-08-14 15:27:23.220000 | 127.0.0.1 |           1668 | 127.0.0.1                                       
                                                               Submitted 1 concurrent range requests [Native-Transport-Requests-1] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2260 | 127.0.0.1
                                                Executing read on reviews_db.reviews using index reviews_houseid_idx [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2341 | 127.0.0.1
                                                     Executing single-partition query on reviews.reviews_houseid_idx [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2400 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2445 | 127.0.0.1
                                           Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2018-08-14 15:27:23.221000 | 127.0.0.1 |           2546 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1029 [ReadStage-2] | 2018-08-14 15:27:23.227000 | 127.0.0.1 |           8775 | 127.0.0.1
                                                                            Bloom filter allows skipping sstable 819 [ReadStage-2] | 2018-08-14 15:27:23.228000 | 127.0.0.1 |           9481 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1176 [ReadStage-2] | 2018-08-14 15:27:23.229000 | 127.0.0.1 |          10102 | 127.0.0.1
                                                                Partition index with 0 entries found for sstable 517 [ReadStage-2] | 2018-08-14 15:27:23.234000 | 127.0.0.1 |          15699 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1259 [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22535 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22724 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22751 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.241000 | 127.0.0.1 |          22779 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.251000 | 127.0.0.1 |          32604 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39903 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39959 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.258000 | 127.0.0.1 |          39987 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.260000 | 127.0.0.1 |          41753 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.269000 | 127.0.0.1 |          50605 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.275000 | 127.0.0.1 |          57061 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57325 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57412 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.276000 | 127.0.0.1 |          57462 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.278000 | 127.0.0.1 |          59387 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.287000 | 127.0.0.1 |          68588 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.294000 | 127.0.0.1 |          75900 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76188 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76267 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.295000 | 127.0.0.1 |          76321 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.302000 | 127.0.0.1 |          83846 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.313000 | 127.0.0.1 |          94648 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103400 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103745 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.322000 | 127.0.0.1 |         103833 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.322001 | 127.0.0.1 |         103901 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.336000 | 127.0.0.1 |         117832 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125175 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125275 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.344000 | 127.0.0.1 |         125346 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.347000 | 127.0.0.1 |         128201 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.358000 | 127.0.0.1 |         139767 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.367000 | 127.0.0.1 |         148635 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149174 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149290 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.368000 | 127.0.0.1 |         149357 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.371000 | 127.0.0.1 |         152815 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.379000 | 127.0.0.1 |         160651 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169148 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169607 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169690 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.388000 | 127.0.0.1 |         169759 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.389000 | 127.0.0.1 |         170955 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.399000 | 127.0.0.1 |         180652 | 127.0.0.1
                                                                         Executing single-partition query on reviews [ReadStage-2] | 2018-08-14 15:27:23.406000 | 127.0.0.1 |         188039 | 127.0.0.1
                                                                                        Acquiring sstable references [ReadStage-2] | 2018-08-14 15:27:23.407000 | 127.0.0.1 |         188130 | 127.0.0.1
                                                                                           Merging memtable contents [ReadStage-2] | 2018-08-14 15:27:23.407000 | 127.0.0.1 |         188180 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1215 [ReadStage-2] | 2018-08-14 15:27:23.412000 | 127.0.0.1 |         193070 | 127.0.0.1
                                                               Partition index with 0 entries found for sstable 1009 [ReadStage-2] | 2018-08-14 15:27:23.420000 | 127.0.0.1 |         201613 | 127.0.0.1
                                                                           Bloom filter allows skipping sstable 1214 [ReadStage-2] | 2018-08-14 15:27:23.427000 | 127.0.0.1 |         208842 | 127.0.0.1
                                                                              Read 9 live rows and 0 tombstone cells [ReadStage-2] | 2018-08-14 15:27:23.427000 | 127.0.0.1 |         209064 | 127.0.0.1
                                                                           Merged data from memtables and 3 sstables [ReadStage-2] | 2018-08-14 15:27:23.428000 | 127.0.0.1 |         209165 | 127.0.0.1
                                                                                                                  Request complete | 2018-08-14 15:27:23.427622 | 127.0.0.1 |         209622 | 127.0.0.1

The query takes 209ms, and I want to cut it down to less than 50ms.查询需要 209 毫秒,我想将其缩短到 50 毫秒以内。 Are there ways that I could achieve such time?有什么方法可以达到这样的时间吗?

Sure.当然。 Create a query table designed around houseid :创建一个围绕houseid设计的查询表:

CREATE TABLE reviews_by_house_id(
  id INT,
  houseId INT, 
  name TEXT,
  picture TEXT,
  reviewText TEXT,
  reviewDate TEXT,
  accuracyRating INT,
  locationRating INT,
  communicationRating INT,
  checkinRating INT,
  cleanlinessRating INT,
  valueRating INT,
  overallRating DECIMAL,
  PRIMARY KEY(houseId,id));

Secondary index queries (even on a single node instance) will never achieve that level of performance.二级索引查询(即使在单个节点实例上)永远不会达到该级别的性能。 If you really need the original table, then keep them both in sync with BATCHed writes.如果您确实需要原始表,则将它们与 BATCHed 写入保持同步。 I'd be willing to bet that a query by houseId on this table would fit your performance requirements.我敢打赌,这个表上的houseId查询将满足您的性能要求。

You can't perform effective query on the non-partition key, like your houseId because it will require scanning of all existing partitions & extracting data from them to match your field.您无法对非分区键执行有效查询,例如您的houseId因为它需要扫描所有现有分区并从中提取数据以匹配您的字段。 You can have condition on the houseId if you have condition on id as well.如果您对id也有条件,则可以对houseId有条件。

In Cassandra you create a data model around queries that you need to execute, so you have following possibilities:在 Cassandra 中,您围绕需要执行的查询创建数据模型,因此您有以下可能性:

  • Create the secondary table with houseId as a partition key, and fill it yourself (maybe with less data);创建以houseId为分区键的二级表,自己填写(可能数据少一些);
  • Use materialized views (although they are still considered experimental feature);使用物化视图(尽管它们仍被视为实验性功能);
  • Use secondary indexes, but this should be checked as they could be used only in specific cases.使用二级索引,但这应该被检查,因为它们只能在特定情况下使用。 You can read more about them in this blog post .您可以在此博客文章中阅读有关它们的更多信息

If you're have a chance to use DataStax enterprise, you have another possibility - DSE Search.如果您有机会使用 DataStax 企业,您还有另一种可能性 - DSE 搜索。 Just create a search index on your table, and query will be fulfilled by Solr that is underneath of DSE Search (although latencies will be higher than in case of "plain Cassandra").只需在您的表上创建一个搜索索引,查询将由位于 DSE 搜索下方的 Solr 完成(尽管延迟将高于“普通 Cassandra”的情况)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM