What is the procedure to follow to restore Cassandra incremental backup when I have large number of files in backup directory

Question

I'm posing this question as I don't see any specific method on DataStax Docs.

I have enabled backup after I took Snapshot, and now I see there are around 200k files in backup directory. I'm not sure what is best way to restore them.

Copying all of them to Keyspace table directory and did a nodetool refresh <ks> <tbl> but I don't see it working as expected and it is throwing StackOverflow exception. Is there a way to do work around for this?

I'm using 16G Xmx as of now. I see some errors in logs as below. Is this something to so with JVM params?

ERROR [gbp-cass-49] [Reference-Reaper:1] 2020-07-29 18:49:01,704 Ref.java:223 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@156d6370) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@464162733:[Memory@[0..80), Memory@[0..a00)] was not released before the reference was garbage collected

nodetool refresh has thrown the following errors on stdout:

error: null
-- StackTrace --
java.lang.AssertionError
    at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:178)
    at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:173)
    at org.apache.cassandra.io.sstable.format.SSTableWriter.rename(SSTableWriter.java:273)
    at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:714)
    at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:658)
    at org.apache.cassandra.service.StorageService.loadNewSSTables(StorageService.java:4555)
    at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
    at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
    at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
    at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
    at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
    at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
    at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
    at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
    at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
    at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
    at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
    at sun.rmi.transport.Transport$1.run(Transport.java:200)
    at sun.rmi.transport.Transport$1.run(Transport.java:197)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$241(TCPTransport.java:683)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$177/1629407070.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Answer 1

There really isn't enough actionable information in your question to be able to provide a meaningful answer but I'll try my best to respond.

Incremental backups allow you to offload copies of the data to an off-server storage. However, since Cassandra hard-links every single flushed memtable to the backups/ directory, its contents can grow wild pretty quickly so you need to manage it. This would explain why you ended up with 200K backups.

Incremental backups are meant to be used in conjunction with snapshots which are the equivalent of full backups in the traditional sense that most people think of backups. Consider snapshots as akin to cold backups, incremental backups as the delta since the last snapshot.

This means that every time you take a snapshot on a node, you need to clear the incremental backups in the backups/ directory. Following on from this, when you restore incremental backups you need to restore the respective snapshot (aka full backup) then apply the incrementals (backup of "deltas" after the snapshot).

In order to respond to the other points you raised, you will need to explain what you meant by "I don't see it working as expected". Also, what is the full error message plus full stack trace for the exception? That level of detail is required in order to make a meaningful diagnosis other than "it doesn't work".

The error you posted is safe to ignore. That's just a message that the Reference-Reaper thread was successful in finding orphaned references and released them back to the pool. It really should be logged at INFO and not ERROR level.

I hope this helps. Cheers!

[EDIT] The stack trace you posted in your update to me looks like you have a filesystem permissions issue. C* can't rename the files so probably (a) have the wrong ownership, (b) incorrect permissions, or (c) both. Cheers!

What is the procedure to follow to restore Cassandra incremental backup when I have large number of files in backup directory

Question

1 answers

solution1
1 2020-07-31 03:27:37

What is the procedure to follow to restore Cassandra incremental backup when I have large number of files in backup directory

Question

1 answers

solution1 1 2020-07-31 03:27:37

solution1
1 2020-07-31 03:27:37