Cassandra: how to move whole tables to another keyspace

Question

Version info of my Cassandra:

[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]

I am trying to move some huge tables (several million rows) to another keyspace. Besides "COPY to csv, and COPY from csv", is there any better solution?

Answer 1

Ok, I managed to get this to work on a single-node cluster running 2.2.8.

I experimented by moving my holidays table from my presentation keyspace over to my stackoverflow keyspace.

Here are the steps I took:

Create the table inside the new keyspace.

This step is important, because each table has a UUID for a unique identifier, stored in the system.schema_columnfamilies table in the cf_id column. This id is attached to the directory names that hold the data. By copy/pasting the schema from one keyspace to another, you'll ensure that the same column names are used, but that a new unique identifier is generated.

Note: In 3.x, the identifier is stored in the system_schema.tables table.

aploetz@cqlsh:stackoverflow> CREATE TABLE holidays (
 type text,
 eventtime timestamp,
 beginend text,
 name text,
 PRIMARY KEY (type, eventtime, beginend)
) WITH CLUSTERING ORDER BY (eventtime DESC, beginend DESC);

aploetz@cqlsh:stackoverflow> SELECT * FROM stackoverflow.holidays ;

 type | eventtime | beginend | name
------+-----------+----------+------

(0 rows)

Stop your node(s) properly (DISABLEGOSSIP, DRAIN, kill/stop, etc...).

Now, find the location of the old and new table on disk, and copy/move the files to the new location (from the old location):

$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
total 12
drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:25 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
$ cp /var/lib/cassandra/data22/presentation/holidays-74bcfde0139011e6a67c2575e6398503/la* /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/

$ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:26 .
drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
-rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-1-big-CompressionInfo.db
-rw-rw-r--  1 aploetz aploetz  628 Nov  8 14:26 la-1-big-Data.db
-rw-rw-r--  1 aploetz aploetz    9 Nov  8 14:26 la-1-big-Digest.adler32
-rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-1-big-Filter.db
-rw-rw-r--  1 aploetz aploetz   57 Nov  8 14:26 la-1-big-Index.db
-rw-rw-r--  1 aploetz aploetz 4468 Nov  8 14:26 la-1-big-Statistics.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-Summary.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-TOC.txt
-rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-2-big-CompressionInfo.db
-rw-rw-r--  1 aploetz aploetz  164 Nov  8 14:26 la-2-big-Data.db
-rw-rw-r--  1 aploetz aploetz   10 Nov  8 14:26 la-2-big-Digest.adler32
-rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-2-big-Filter.db
-rw-rw-r--  1 aploetz aploetz   26 Nov  8 14:26 la-2-big-Index.db
-rw-rw-r--  1 aploetz aploetz 4460 Nov  8 14:26 la-2-big-Statistics.db
-rw-rw-r--  1 aploetz aploetz  108 Nov  8 14:26 la-2-big-Summary.db
-rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-2-big-TOC.txt

Restart your node(s).

Query via cqlsh:

Connected to SnakesAndArrows at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
aploetz@cqlsh> SELECT * FROM stackoverflow.holidays ;

 type         | eventtime                | beginend | name
--------------+--------------------------+----------+------------------------
    Religious | 2016-12-26 05:59:59+0000 |        E |              Christmas
    Religious | 2016-12-25 06:00:00+0000 |        B |              Christmas
    Religious | 2016-03-28 04:59:59+0000 |        E |                 Easter
    Religious | 2016-03-27 05:00:00+0000 |        B |                 Easter
 presentation | 2016-05-06 20:40:08+0000 |        B |        my presentation
 presentation | 2016-05-06 20:40:03+0000 |        B |        my presentation
 presentation | 2016-05-06 20:39:15+0000 |        B |        my presentation
 presentation | 2016-05-06 20:38:10+0000 |        B |        my presentation
           US | 2016-07-05 04:59:59+0000 |        E |            4th of July
           US | 2016-07-04 05:00:00+0000 |        B |            4th of July
           US | 2016-05-09 04:59:59+0000 |        E |            Mothers Day
           US | 2016-05-08 05:00:00+0000 |        B |            Mothers Day
         Nerd | 2016-12-22 05:59:59+0000 |        E |               2112 Day
         Nerd | 2016-12-21 06:00:00+0000 |        B |               2112 Day
         Nerd | 2016-09-26 04:59:59+0000 |        E |             Hobbit Day
         Nerd | 2016-09-25 05:00:00+0000 |        B |             Hobbit Day
         Nerd | 2016-09-20 04:59:59+0000 |        E | Talk Like a Pirate Day
         Nerd | 2016-09-19 05:00:00+0000 |        B | Talk Like a Pirate Day
         Nerd | 2016-05-07 04:59:59+0000 |        E |         Star Wars Week
         Nerd | 2016-05-04 05:00:00+0000 |        B |         Star Wars Week
         Nerd | 2016-03-14 05:00:00+0000 |        E |                 Pi Day
         Nerd | 2016-03-14 05:00:00+0000 |        B |                 Pi Day

(22 rows)

The problem with this approach, is that you will need to stop the cluster, and move files around on each node. Whereas cqlsh COPY would allow you to import and export on a single node, with the cluster still running.

And I know that COPY has this reputation that limits it to smaller datasets. But 2.2.x has options that help throttle COPY to keep it from timing out on large datasets. I recently got it export/import 370 million rows without a timeout.

Cassandra: how to move whole tables to another keyspace

Question

1 answers

solution1
4 ACCPTED 2016-11-08 20:50:59

Cassandra: how to move whole tables to another keyspace

Question

1 answers

solution1 4 ACCPTED 2016-11-08 20:50:59

solution1
4 ACCPTED 2016-11-08 20:50:59