简体   繁体   中英

neo4j - batch insertion using neo4j rest graph db

I'm using version 2.0.1 .

I have like hundred of thousands of nodes that needs to be inserted. My neo4j graph db is on a stand alone server, and I'm using RestApi through the neo4j rest graph db library to achieved this.

However, I'm facing a slow performance result. I've chopped my queries into batches, sending 500 cypher statements in a single http call. The result that I'm getting is like:

10:38:10.984 INFO commit
10:38:13.161 INFO commit
10:38:13.277 INFO commit
10:38:15.132 INFO commit
10:38:15.218 INFO commit
10:38:17.288 INFO commit
10:38:19.488 INFO commit
10:38:22.020 INFO commit
10:38:24.806 INFO commit
10:38:27.848 INFO commit
10:38:31.172 INFO commit
10:38:34.767 INFO commit
10:38:38.661 INFO commit

And so on. The query that I'm using is as follows:

MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);

My code is this:

private RestAPI restAPI;
private RestCypherQueryEngine engine;
private GraphDatabaseService graphDB = new RestGraphDatabase("http://localdomain.com:7474/db/data/");

...

restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
engine = new RestCypherQueryEngine(restAPI);

...

    Transaction tx = graphDB.getRestAPI().beginTx();

    try {
        int ctr = 0;
        while (isExists) {
            ctr++;
            //excute query here through engine.query()
            if (ctr % 500 == 0) {
                tx.success();
                tx.close();
                tx = graphDB.getRestAPI().beginTx();
                LOGGER.info("commit");
            }
        }
        tx.success();
    } catch (FileNotFoundException | NumberFormatException | ArrayIndexOutOfBoundsException e) {
        tx.failure();
    } finally {
        tx.close();            
    }

Thanks!

UPDATED BENCHMARK. Sorry for the confusion, the benchmark that I've posted isn't accurate, and is not for 500 queries. My ctr variable isn't actually referring to the number of cypher queries.

So now, I'm having like 500 queries per 3 seconds and that 3 seconds keeps on increasing as well. It's still way slow compared to the embedded neo4j.

If you have to ability to use Neo4j 2.1.0-M01 (don't use it in prod yet!!), you could benefit from new features. If you'd create/generate a CSV file like this:

val1,val2,val3
a_value,another_value,yet_another_value
a,b,c
....

you'd only need to launch the following code:

final GraphDatabaseService graphDB = new RestGraphDatabase("http://server:7474/db/data/");
final RestAPI restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
final RestCypherQueryEngine engine = new RestCypherQueryEngine(restAPI);
final String filePath = "file://C:/your_file_path.csv";
engine.query("USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM '" + filePath
    + "' AS csv MERGE (a{main:csv.val1,prop2:csv.val2}) MERGE (b{main:csv.val3})"
    + " CREATE UNIQUE (a)-[r:relationshipname]->(b);", null);

You'd have to make sure that the file can be accessed from the machine where your server is installed on.

Take a look at my server plugin that does this for you on the server. If you build this and put in the plugins folder, you could use the plugin in java as follows:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
final RequestResult result = restAPI.execute(RequestType.POST, "ext/CSVBatchImport/graphdb/csv_batch_import",
    new HashMap<String, Object>() {
        {
            put("path", "file://C:/.../neo4j.csv");
        }
    });

EDIT:

You can also use a BatchCallback in the java REST wrapper to boost the performance and it removes the transactional boilerplate code as well. You could write your script similar to:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
int counter = 0;
List<Map<String, Object>> statements = new ArrayList<>();
while (isExists) {
    statements.add(new HashMap<String, Object>() {
        {
            put("val1", "abc");
            put("val2", "abc");
            put("val3", "abc");
        }
    });
    if (++counter % 500 == 0) {
        restAPI.executeBatch(new Process(statements));
        statements = new ArrayList<>();
    }
}

static class Process implements BatchCallback<Object> {

    private static final String QUERY = "MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);";

    private List<Map<String, Object>> params;

    Process(final List<Map<String, Object>> params) {
        this.params = params;
    }

    @Override
    public Object recordBatch(final RestAPI restApi) {
        for (final Map<String, Object> param : params) {
            restApi.query(QUERY, param);
        }
        return null;
    }    
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM