简体   繁体   English

neo4j - 使用neo4j rest graph db进行批量插入

[英]neo4j - batch insertion using neo4j rest graph db

I'm using version 2.0.1 . 我使用的是2.0.1版本。

I have like hundred of thousands of nodes that needs to be inserted. 我需要插入数十万个节点。 My neo4j graph db is on a stand alone server, and I'm using RestApi through the neo4j rest graph db library to achieved this. 我的neo4j图形数据库位于独立服务器上,我正在使用RestApi通过neo4j rest图谱库来实现这一点。

However, I'm facing a slow performance result. 但是,我面临着一个缓慢的表现结果。 I've chopped my queries into batches, sending 500 cypher statements in a single http call. 我把我的查询分成几批,在一个http调用中发送500个密码语句。 The result that I'm getting is like: 我得到的结果是:

10:38:10.984 INFO commit
10:38:13.161 INFO commit
10:38:13.277 INFO commit
10:38:15.132 INFO commit
10:38:15.218 INFO commit
10:38:17.288 INFO commit
10:38:19.488 INFO commit
10:38:22.020 INFO commit
10:38:24.806 INFO commit
10:38:27.848 INFO commit
10:38:31.172 INFO commit
10:38:34.767 INFO commit
10:38:38.661 INFO commit

And so on. 等等。 The query that I'm using is as follows: 我正在使用的查询如下:

MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);

My code is this: 我的代码是这样的:

private RestAPI restAPI;
private RestCypherQueryEngine engine;
private GraphDatabaseService graphDB = new RestGraphDatabase("http://localdomain.com:7474/db/data/");

... ...

restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
engine = new RestCypherQueryEngine(restAPI);

... ...

    Transaction tx = graphDB.getRestAPI().beginTx();

    try {
        int ctr = 0;
        while (isExists) {
            ctr++;
            //excute query here through engine.query()
            if (ctr % 500 == 0) {
                tx.success();
                tx.close();
                tx = graphDB.getRestAPI().beginTx();
                LOGGER.info("commit");
            }
        }
        tx.success();
    } catch (FileNotFoundException | NumberFormatException | ArrayIndexOutOfBoundsException e) {
        tx.failure();
    } finally {
        tx.close();            
    }

Thanks! 谢谢!

UPDATED BENCHMARK. 更新的基准。 Sorry for the confusion, the benchmark that I've posted isn't accurate, and is not for 500 queries. 很抱歉,我发布的基准测试不准确,不适用于500个查询。 My ctr variable isn't actually referring to the number of cypher queries. 我的ctr变量实际上并不是指cypher查询的数量。

So now, I'm having like 500 queries per 3 seconds and that 3 seconds keeps on increasing as well. 所以现在,我每3秒钟就有500次查询,并且3秒也在不断增加。 It's still way slow compared to the embedded neo4j. 与嵌入式neo4j相比,它仍然很慢。

If you have to ability to use Neo4j 2.1.0-M01 (don't use it in prod yet!!), you could benefit from new features. 如果你必须能够使用Neo4j 2.1.0-M01(不要在prod中使用!!),你可以从新功能中受益。 If you'd create/generate a CSV file like this: 如果您要创建/生成这样的CSV文件:

val1,val2,val3
a_value,another_value,yet_another_value
a,b,c
....

you'd only need to launch the following code: 您只需要启动以下代码:

final GraphDatabaseService graphDB = new RestGraphDatabase("http://server:7474/db/data/");
final RestAPI restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
final RestCypherQueryEngine engine = new RestCypherQueryEngine(restAPI);
final String filePath = "file://C:/your_file_path.csv";
engine.query("USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM '" + filePath
    + "' AS csv MERGE (a{main:csv.val1,prop2:csv.val2}) MERGE (b{main:csv.val3})"
    + " CREATE UNIQUE (a)-[r:relationshipname]->(b);", null);

You'd have to make sure that the file can be accessed from the machine where your server is installed on. 您必须确保可以从安装服务器的计算机访问该文件。

Take a look at my server plugin that does this for you on the server. 看看我的服务器插件 ,它在服务器上为您执行此操作。 If you build this and put in the plugins folder, you could use the plugin in java as follows: 如果你构建它并放入plugins文件夹,你可以使用java中的插件,如下所示:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
final RequestResult result = restAPI.execute(RequestType.POST, "ext/CSVBatchImport/graphdb/csv_batch_import",
    new HashMap<String, Object>() {
        {
            put("path", "file://C:/.../neo4j.csv");
        }
    });

EDIT: 编辑:

You can also use a BatchCallback in the java REST wrapper to boost the performance and it removes the transactional boilerplate code as well. 您还可以在java REST包装器中使用BatchCallback来提高性能,并且还会删除事务样板代码。 You could write your script similar to: 您可以编写类似于以下内容的脚本:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
int counter = 0;
List<Map<String, Object>> statements = new ArrayList<>();
while (isExists) {
    statements.add(new HashMap<String, Object>() {
        {
            put("val1", "abc");
            put("val2", "abc");
            put("val3", "abc");
        }
    });
    if (++counter % 500 == 0) {
        restAPI.executeBatch(new Process(statements));
        statements = new ArrayList<>();
    }
}

static class Process implements BatchCallback<Object> {

    private static final String QUERY = "MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);";

    private List<Map<String, Object>> params;

    Process(final List<Map<String, Object>> params) {
        this.params = params;
    }

    @Override
    public Object recordBatch(final RestAPI restApi) {
        for (final Map<String, Object> param : params) {
            restApi.query(QUERY, param);
        }
        return null;
    }    
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM