简体   繁体   English

cassandra-高并发读写应用程序问题

[英]cassandra - high concurrency read-write app problems

I'm implementing an app, which is generating hundreds of thousands rows in 4 threads. 我正在实现一个应用程序,它将在4个线程中生成数十万行。 Each thread opens a separate connection to cassandra. 每个线程都打开与cassandra的单独连接。

Every item of the table has a unique hash identifier (String), but the primary key is an uuid. 该表的每个项目都有一个唯一的哈希标识符(字符串),但是主键是一个uuid。

The process of the item persisting is the following: 该项目的保留过程如下:

1) The item is created and its hash is computed. 1)创建项目并计算其哈希值。 2) Then a lookup for the hash is being executed in a second table, which pairs hashes accordingly to the item's uuids. 2)然后在第二个表中执行对哈希的查找,该表将哈希与项目的uuid相应地配对。 3) If a hash - uuid pair is found, a lookup for the items uuid is being executed (1st table again) and since the item has to exist (because a "hash - uuid" pair was found), the item is loaded from cassandra to JPA and it's updated afterwards. 3)如果找到了哈希-uuid对,则正在执行对uuid项的查找(再次是第一个表),并且由于该项必须存在(因为找到了“哈希-uuid”对),因此将从cassandra转换为JPA,之后进行更新。 When no "hash - uuid" pair is found, a new item is created in the corresponding table and a new "hash - uuid" pair is saved as well. 如果找不到“ hash-uuid”对,则会在相应的表中创建一个新项目,并保存一个新的“ hash-uuid”对。

The data generation has two steps. 数据生成有两个步骤。 The first step is running with empty tables and generates the first datasets. 第一步是运行空表并生成第一个数据集。 No errors happen there, because in the step nr. 在那里没有错误发生,因为在步骤nr中。 3, a "hash - uuid" pair is never found, so no updates occur. 3,永远不会找到一个“哈希-uuid”对,因此不会发生任何更新。

In the second step, the whole algorithm runs again, but already on populated data tables. 第二步,整个算法再次运行,但已在填充的数据表上运行。 In this step, random errors occur while reading the data items byt their correspnding uuids (primary keys) - sometimes the server doesn't retun complete text data (proper JSON strings are stored in the table, but incomplete JSON strings are retrieved into the application). 在此步骤中,通过相应的uuid(主键)读取数据项时会发生随机错误-有时服务器不会重新调整完整的文本数据(适当的JSON字符串存储在表中,但是不完整的JSON字符串被检索到应用程序中)。

I'm completely sure, that my algorithm is correct, because the same algorithem worked with hibernate and mysql, even with postgresql (but since I need faster writes, I'm playing around with cassandra). 我完全确定,我的算法是正确的,因为即使在postgresql中,相同的算法也适用于hibernate和mysql(但是由于我需要更快的写入速度,所以我正在研究cassandra)。

I am using a macbook pro with 16 GB RAM, for the work with cassandra I use the Kundera library (supports JPA). 我正在使用具有16 GB RAM的macbook pro,对于使用cassandra的工作,我使用Kundera库(支持JPA)。 As for cassandra, I have tried the datastax 2.0.4 version, and also the 2.0.7 version downloaded directly from the Apache site. 至于cassandra,我尝试了datastax 2.0.4版本,也尝试了直接从Apache站点下载的2.0.7版本。 There is no cluster, only one instance is running locally on my machine, on an external SSD drive. 没有集群,只有一个实例在我的计算机上本地运行,在外部SSD驱动器上。 Kundera is using CQL v3. 昆德拉正在使用CQL v3。

Has anybody an idea, how this behaviour could occur? 有谁知道这种行为如何发生? Is there a bug in the datastax cassandra driver or in Kundera? datastax cassandra驱动程序或Kundera中存在错误吗? Or am I using cassandra wrong and the database shouldn't be used this way? 还是我使用cassandra错误并且不应以这种方式使用数据库? Or are there any configuration tweaks which I might have forgotten? 还是我可能忘记了任何配置调整?

The only thing I have changed in the cassandra configuration file are all the timeouts, because I was getting too many TimeoutExceptions with the default values (the timeouts occured during primary key lookups) 我在cassandra配置文件中唯一更改的是所有超时,因为我收到了太多带有默认值的TimeoutExceptions(超时发生在主键查找期间)

I suspect your code is not using the Cassandra connections in a threadsafe manner: care must be taken to only allow one thread to access a connection at a time. 我怀疑您的代码没有以线程安全的方式使用Cassandra连接:必须注意只能一次允许一个线程访问连接。 I do not know how Kundera approaches this, because JPA will generate incredibly inefficient queries for Cassandra and I do not recommend it. 我不知道Kundera是如何做到这一点的,因为JPA会为Cassandra生成效率低下的查询,我不建议这样做。 See the data modeling resources here , and use the native CQL java driver . 此处查看数据建模资源 ,并使用本机CQL Java驱动程序

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM