简体   繁体   中英

c# How to insert huge amount of data into Cassandra table

Dears,

I'm trying to insert about 100 000 rows into Cassandra database using C# application.

To achive this I'm using nuget:

https://www.nuget.org/packages/CassandraCSharpDriver/

I have installed Cassandra locally on my laptop (i5, 32GB RAM, windows 10).

Settings of my Cassandra are default:

var cluster = Cluster.Builder()
                                 .AddContactPoints(CassandraContactPoint)
                                 .WithPort(CassandraPort)
                                 .WithLoadBalancingPolicy(new DCAwareRoundRobinPolicy("datacenter1"))
                                 .WithAuthProvider(new PlainTextAuthProvider(UserName, Password))
                                 .Build();

Cassandra table looks like below:

            session.Execute("DROP KEYSPACE IF EXISTS eventstore");
            session.Execute("CREATE KEYSPACE eventstoreWITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };");

            session.Execute(@"
                                CREATE TABLE IF NOT EXISTS eventstore.Event(
                                Id uuid, 
                                Data text, 
                                Version int,
                                AgregateId uuid,
                                EventIdentity uuid,
                                Date timestamp,
                                  PRIMARY KEY (AgregateId,Version)
                                ) WITH CLUSTERING ORDER BY (Version ASC)");

To insert events I'm using the following code:

events variable contains 2000 events which are inserted in 3 seconds.

            var tasks = events.Select(async @event =>
            {
                await mapper.InsertAsync(@event);
            });

            await Task.WhenAll(tasks);

At the moment performance of this solution is about 3 seconds for 2000 events. It is possible to insert data faster??

There are several techniques you can use to send a steady flow of executions while also limiting the concurrency level.

There's an example in the driver repository: https://github.com/datastax/csharp-driver/blob/master/examples/ConcurrentExecutions/ExecuteInLoop/Program.cs

There's also a topic in the developer guide of the DataStax drivers: https://docs.datastax.com/en/devapp/doc/devapp/driverManagingConcurrency.html

When submitting several requests in parallel, the requests are queued at one of three levels: on the driver side, on the network stack, or on the server side. Excessive queueing on any of these levels affects the total time it takes each operation to complete. Adjust the concurrency level, or number of simultaneous requests, to reduce the amount of queuing and get high throughput and low latency.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM