简体   繁体   中英

Unable to update 6m+ documents on couchbase server community edition 3.0.1

I am trying to update 6 million+ documents in a couchbase server community edition 3.0.1 server cluster. I am using latest java sdk and tried various ways in which I could read a batch of documents from a View, update them and replace them back to the bucket.

It seems to me that as the process progresses the throughput gets too slow that its not even 300 op/s. I tried using many ways to do this using bulk operation method (using Observable) to speed it up but in vain. I even let the process run for hours only to see Timeout exception later.

The last option I tried was to read all the document IDs into a temp file from the View so that I can read the file back and update the records. But, after 3 hrs and only 1.7m IDs read (just ~157 items/sec!) from the View, the DB gives Timeout exception.

Note that the the couchbase cluster contains 3 servers (Ubuntu 14.04) with 8 cores, 24GB RAM & 1TB SSD each and the java code running to update data is in the same network with 4 cores, 16GB RAM & 1TB SSD. And there is no other load running on this cluster.

It seems, reading even all the IDs from the view of the server is impossible. I checked the network throughput and the DB server was giving the data barely at 1mbps.

Below is the sample code used to read all the doc IDs from the view:

final Bucket statsBucket = db.getStatsBucket();
int skipCount = 0;
int limitCount = 10000;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
    while (true)
    {
        ViewResult result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").skip(skipCount).limit(limitCount).stale(Stale.TRUE));

        Iterator<ViewRow> rows = result.iterator();

        if (!rows.hasNext())
        {
            break;
        }

        while (rows.hasNext())
        {
            out.writeUTF(rows.next().id());
        }

        skipCount += limitCount;
        System.out.println(skipCount);
    }
}

I have tried this even with using bulk operation (Observable) method without any success. Also have tried changing the limit count to 1000 (without limiting the java app goes nuts after some time and even the SSH stops responding.

Is there a way to do this?

I found the solution. The ViewQuery.skip() method is not really skipping and should not be used for pagination. The skip() method will just read all the data from beginning of the view and only start giving output after the number of records are read, just like a linked list.

Solution is to use startKey() and startKeyDocId(). The ID that goes into these methods is the last item's ID you had read. Got this solution from here: http://tugdualgrall.blogspot.in/2013/10/pagination-with-couchbase.html

So the final code to read all items in a view is:

final Bucket statsBucket = db.getStatsBucket();
int limitCount = 10000;
int skipCount = 0;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
    String lastKeyDocId = null;

    while (true)
    {
        ViewResult result;

        if (lastKeyDocId == null)
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.FALSE));
        }
        else
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.TRUE).startKey(lastKeyDocId).skip(1));
        }

        Iterator<ViewRow> rows = result.iterator();

        if (!rows.hasNext())
        {
            break;
        }

        while (rows.hasNext())
        {
            lastKeyDocId = rows.next().id();
            out.writeUTF(lastKeyDocId);
        }

        skipCount += limitCount;
        System.out.println(skipCount);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM