简体   繁体   中英

Neo4j: Create nodes via CYPHER/REST slow

I try to create/update nodes via the REST API with Cypher's MERGE -statement. Each node has attributes of ca. 1kb (sum of all sizes). I create/update 1 node per request. (I know there are other ways to create lots of nodes in a batch, but this is not the question here.)

I use Neo4j community 2.1.6 on a Windows Server 2008 R2 Enterprise (24 CPUs, 64GB) and the database directory resides on a SAN drive. I get a rate of 4 - 6 nodes per second. Or in other words, a single create or update takes around 200ms. This seems rather slow for me.

The query looks like this:

MERGE (a:TYP1 { name: {name}, version: {version} }) 
SET 
    a.ATTR1={param1},
    a.ATTR2={param2},
    a.ATTR3={param3},
    a.ATTR4={param4},
    a.ATTR5={param5} 
return id(a)

There is an index on name, version and two of the attributes.

Why does it take so long? And what can I try to improve the situation?

I could imagine that one problem is that every request must create a new connection? Is there a way to keep the http connection open for multiple requests?

For a query I'm pretty sure you can only use one index per query per label, so depending on your data they index usage might not be efficient.

As far as a persistent connection, that is possible, though I think it would depend on the library you're using to connect to the REST API. In the ruby neo4j gem we use the Faraday gem which has a NetHttpPersistent adapter.

  1. The index is only used when you use ONE attribute with MERGE
  2. If you need to merge on both, create a compound property, index it (or better use a constraint) and merge on that compound property
  3. Use ON CREATE SET otherwise you (over-)write the attributes everytime, even if you didn't actually create the node.

Adapted Statement

MERGE (a:TYP1 { name_version: {name_version} }) 
ON CREATE SET 
    a.version = {version}
    a.name = {name}  
    a.ATTR1={param1},
    a.ATTR2={param2},
    a.ATTR3={param3},
    a.ATTR4={param4},
    a.ATTR5={param5} 
return id(a)

This is an example of how you can execute a batch of cypher queries from nodejs in one communication with the Neo4j. To run it,

prerequisites:

var request=require("request") ;
var graph = require('fbgraph');
graph.setAccessToken(process.argv[2]);

function now() {
    instant = new Date();
    return instant.getHours() 
        +':'+ instant.getMinutes() 
        +':'+ instant.getSeconds()  
        +'.'+ instant.getMilliseconds();
} 

Get facebook data:

graph.get('me?fields=groups,friends', function(err,res) {
    if (err) {
        console.log(err);
        throw now() +' Could not get groups from faceBook';
    }

Create cypher statements

    var batchCypher = [];
    res.groups.data.forEach(function(group) {
        var singleCypher = {
            "statement" : "CREATE (n:group{group}) RETURN n, id(n)",
            "parameters" : { "group" : group }
        }
        batchCypher.push(singleCypher);

Run them one by one

        var fromNow = now();
        request.post({
            uri:"http://localhost:7474/db/data/transaction/commit", 
            json:{statements:singleCypher}
        }, function(err,res) { 
            if (err) {
                console.log('Could not commit '+ group.name);
                throw err;
            }
            console.log('Used '+ fromNow +' - '+ now() +' to commit '+ group.name);
            res.body.results.forEach(function(cypherRes) {
                console.log(cypherRes.data[0].row);
            });
        })
    });

Run them in batch

    var fromNow = now();
    request.post({
        uri:"http://localhost:7474/db/data/transaction/commit", 
        json:{statements:batchCypher}
    }, function(err,res) { 
        if (err) {
            console.log('Could not commit the batch');
            throw err;
        }
        console.log('Used '+ fromNow +' - '+ now() +' to commit the batch');
    }) 
});

The log shows that a transaction for 5 groups is significantly slower than a transactions for 1 group but significantly faster than 5 transactions for 1 group each.

Used 20:38:16.19 - 20:38:16.77 to commit Voiture occasion Belgique
Used 20:38:16.29 - 20:38:16.82 to commit  Marches & Randonnées
Used 20:38:16.31 - 20:38:16.86 to commit Vlazarus
Used 20:38:16.34 - 20:38:16.87 to commit Wijk voor de fiets
Used 20:38:16.33 - 20:38:16.91 to commit Niet de bestemming maar de route maakt de tocht goed.
Used 20:38:16.35 - 20:38:16.150 to commit the batch

I just read your comment, Andreas, do it is not applicable for you, but you might use it to find out if the time is spent in the communication or in the updates

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM