简体   繁体   中英

Cassandra insert 1020000 records take a lot of time - then crashing with write timeout during write query at consistency ONE (0 replica(s)

I'm trying to insert 1020000 records with multiple columns (column physical size my varies several of them are blobs).

my cluster setup: 2 nodes using:

create keyspace myks with replication = {'class':'SimpleStrategy','replication_factor':2};

while trying to insert this data using c# client:

I get Cassandra.WriteTimeoutException

-

Cassandra timeout during write query at consistency ONE (0 replica(s) acknowledged the write over 1 required)

while trying to retrieving data from console I've got an error:

errors={}, last_host=192.168.180.93

any suggestions?

my schema is:

create table my_table(
id bigint, 
seqid int, 
activeeventtime int,
eventtime int,
eventtype text,
width int,
height int,
x int,
y int,
buttonstatetype text,
eventtype text,
statetype text,
eid int,
directiontype text,
gdistance int,
griddeclaration boolean,
pathdeclaration boolean,
child blob,
mpath blob,
primary key(id, seqid ))

some code snippet of how i'm trying to insert the data:

for (long i = 1; i <= 10000; i++) 
{
    for (int j = 0; j < enericEvents.Count;++)
    {
       GenericSessionEvent currEvent = genericEvents[j];
       ser1.Serialize(stream1, currEvent.Child); ser2.Serialize(stream2,    
       currEvent.Element); 
       BoundStatement boundStatement = preparedStatement.Bind
      (i, j, ....stream1.GetBuffer(), stream2.GetBuffer());
      await session.ExecuteAsync(boundStatement); 
    }
}

what I see in the logs that seems strange is:

WARN [CompactionExecuter:13]... BigTableWriter.java:184 - Writing large 
partition ...tableNAme (107865330 bytes).

I want also to mention that it crashed ~ when i variable value is about 30. but it takes several minutes to reach there also. after that is crashed.

1. Do you use ExecuteAsync ?

session.ExecuteAsync(statement);

Asynchronism is good because Cassandra is able to handle parallel queries pretty well. But 10.000 queries may be too much to execute at a time.

If you do, try to use Execute. It would help Cassandra a lot.

2. Do you use PreparedStatement ?

var preparedStatement = session.Prepare("INSERT INTO table (key, column_name1, column_name2) VALUES (?, ?, ?);");
var boundStatement = preparedStatement.Bind(key, value1, value2);
session.Execute(boundStatement);

Once again this is a good idea. But be carefull, preparing a statement has a cost. You must reuse already prepared statement as much as possible. Thus you will gain time.

If you do not, you should try, but use them properly.

3. Do you have some very big blobs to insert ?

Log the size of your data from the client side. You may also give us nodetool cfstats of your table.

If some values are bigger than 1Mo it could cause latency issue. Your network could be the problem. But I would rather think the Cassandra heap is the problem. When the Cassandra heap is filled with big values, GC are more frequent and longer. Is there any log of long GC (longer than 200 ms) in Cassandra system.log files ?

You are inserting 10.000 x enericEvents.Count rows in 10.000 partitions.

How much is enericEvents.Count ?

For a single i , all enericEvents.Count go to the same partition. A 107865330 bytes partition (> 100 Mo) is too big.

In the primary key the first column name is the partition key, the second is the clustering key. A solution is to not use any clustering key. You can do it this way :

primary key((id, seqid))

And you should use Execute() instead of ExecuteAsync() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM