简体   繁体   中英

JSON to SSTable tool out-of-memory failure

json2sstable tool supplied with Cassandra 1.2.15 fails with out-of-memory error. Back in 2011 a similar issue was reported as bug and fixed: https://issues.apache.org/jira/browse/CASSANDRA-2189

Either I am missing some steps in the tool configuration/usage or the bug has re-emerged. Please point out what I am missing.

Repro steps:

1) Cassandra 1.2.15, one table with varchar key and one varchar column filled with random uuids, 6x10^6 records.

2) JSON file generated with sstable2json tool (~1G).

3) Cassandra restarted with new configuration (new data/cache/commit dirs, new partitioner)

4) Keyspace re-created

5) json2sstable fails after several minutes of processing:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:2694)
    at java.lang.String.<init>(String.java:203)
    at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
    at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
    at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:104)
    at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
    at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
    at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
    at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
    at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:344)
    at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:328)
    at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:547)

From json2sstable source code, the tool loads all the records from json file into memory and sorts records by keys:

        private int importUnsorted(String jsonFile, ColumnFamily columnFamily, String ssTablePath, IPartitioner<?> partitioner) throws IOException
        {
            int importedKeys = 0;
            long start = System.currentTimeMillis();

            JsonParser parser = getParser(jsonFile);

            Object[] data = parser.readValueAs(new TypeReference<Object[]>(){});

            keyCountToImport = (keyCountToImport == null) ? data.length : keyCountToImport;
            SSTableWriter writer = new SSTableWriter(ssTablePath, keyCountToImport);

            System.out.printf("Importing %s keys...%n", keyCountToImport);

            // sort by dk representation, but hold onto the hex version
            SortedMap<DecoratedKey,Map<?, ?>> decoratedKeys = new TreeMap<DecoratedKey,Map<?, ?>>();

            for (Object row : data)
            {
                Map<?,?> rowAsMap = (Map<?, ?>)row;
                decoratedKeys.put(partitioner.decorateKey( hexToBytes((String)rowAsMap.get("key"))), rowAsMap);
....

According to Jonathan Elis' comment in CASSANDRA-2322 issue the behavior is by design.

Thus json2sstable is not very well suited for importing production size data to Cassandra. The tool is likely to crash on large datasets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM