简体   繁体   中英

Cassandra with large number of columns per row

I setup cassandra with default configuration in clean AWS instance, and I insert 10000 columns into a row, each column has a 1MB data. I use this ruby(version 1.9.3) script:

10000.times do
    key = rand(36**8).to_s(36)
    value = rand(36**1024).to_s(36) * 1024
    Cas_client.insert(TestColumnFamily,TestRow,{key=>value})
end

every time I run this script, it will crash:

/usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in `read_all'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in `read_frame'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `read_message_begin'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `receive_message'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy'        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/protocol.rb:7:in `_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/cassandra.rb:463:in `insert'
    from a.rb:6:in `block in <main>'
    from a.rb:3:in `times'
    from a.rb:3:in `<main>'

yet cassandra performs normally, then I run another ruby script to get how many columns I have inserted:

p cas_client.count_columns(TestColumnFamily,TestRow)

this script crashed again, same error message. And cassandra process remain in 100% cpu usage.

AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu)
cassandra-1.1.2
ruby-1.9.3-p194
jdk-7u6-linux-x64
ruby-gems:
    cassandra (0.15.0)
    thrift (0.8.0)
    thrift_client (0.8.1)

What is the problem?

10,000 columns at 1mb each is 10 gigs of data.

Cassandra rpc uses thrift, which requires that the entire return value from an rpc call must fit in memory, so trying to read all columns would require you to load a 10 gig thrift object into memory which is not practical, especially in ruby.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM