简体   繁体   中英

KairosDB in Java - using the client to delete high volumes of data

Let me know if I posted anything incorrectly, here. (Note: KairosDB is on top of Cassandra. Uses Hector).

I'm using the KairosDB Java Client to dump large amounts of sample data into the datastore. I currently dumped 6 million in, and am now attempting to delete all of it with the method as follows:

public static void purgeData(String metricsType, HttpClient c, int num, TimeUnit units){
    try {
        System.out.println("Beginning method");
        c = new HttpClient("http://localhost:8080/api/v1/datapoints/delete");
        QueryBuilder builder = QueryBuilder.getInstance();
        System.out.println("Preparing to delete info");
        builder.setStart(20, TimeUnit.MONTHS).setEnd(1, TimeUnit.SECONDS).addMetric(metricsType);
        System.out.println("Attempted to delete info");
        QueryResponse response = c.query(builder);
        //System.out.println("JSON: " + response.getJson());

    } catch (Exception e) {
        System.out.println("Adding data points produced an error");
        e.printStackTrace();
    }
}

Note that I removed the time interval parameters simply to try and delete all of the data at once.

When executing this method, no points are seemingly deleted. I opted to curl the query with the JSON form of the data and received a HectorException stating "all host pools marked down. Retry burden pushed out to client".

My personal conclusion is that 6 million is too many to delete at once. I was thinking about deleting pieces at a time, but I don't know how to restrict how many rows I delete from the KDB Java client-side. I know that KairosDB is used in production. How do people effectively delete large amounts of data with the Java Client?

Thanks very much for your time!

You can use cqlsh or cassandra-cli to truncate KairosDBs tables (data_points, row_key_index, string_index). I am not familiar enough with KairosDB to know if thats going to cause issues or not though.

> truncate {your keyspace}.data_points;

it might take a few seconds to complete.

6 million datapoints to delete at once should not make any problem.

This exception is weird, it utually means tht Hector could not communicate with cassandra. Did you check that everything's all right ion KairosDB and cassandra log files? Are all configured coordinators in kairosdb.properties of the cluster alive?

If it's not due to cassandra, I recommend raisning an issue on KairosDB github for your problem, associating your JSON of the query and the log of KairosDB.

There are two ways of deleting data in kairosDB.

A) If you need to delete all datapoints for a given metric, you can just use the delete metric API, it calls the same method in the background so expecte the same results. However it will be much faster because you make sure all matching rows are deleted from Cassandra instead of individual cells.

B) If you need to delete only some datapoints for one metric, then you are already using the right method.

Before going further, I see that you don't define tags in your delete query so you would delete all datapoints for all series of this metric during the time interval... Is it what you want to do?

Last, to answer your questions, we are doing delete operations on large amounts of data (batch reinserts of millions of samples, we delete all the matching series for the time interval then we reinsert). Our operations work on large amounts of metrics (thousands of them), so the delete query is very large but works pretty well, we did not handle millions of points on the same metric, but unless you really have only one series the results should be the same.

If the millions samples to delete appear to be the problem (I doubt it) you can try the following : split your delete query by several time intervals (put several times the same metric in your delete query but with fractions of the total time interval), so you would reduce the amount of samples to delete in one batch.

I hope this helps.

Loic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM