简体   繁体   English

(Neo4j非托管扩展API)为什么查询速度取决于Neo4j中数据集的大小?

[英](Neo4j Unmanaged extension API) Why the speed of query depends on the size of dataset in Neo4j?

I'm trying to build a simple unmanaged extension for a Neo4j server (Community Edition). 我正在尝试为Neo4j服务器(社区版)构建一个简单的非托管扩展。

I have several versions of the same dataset (a small one with 11k nodes, and a larger one with 85k nodes). 我有同一个数据集的多个版本(一个有11k节点的小版本,一个有85k节点的大版本)。 The small one is a subset of the large one. 小的是大的一个子集。 My nodes have an "id" property which is not the neo4j's < id > but another property called "id". 我的节点具有“ id”属性,它不是neo4j的<id>,而是另一个名为“ id”的属性。 I pick a node's id in the small dataset and run the following query in each dataset : 我在小型数据集中选择一个节点的ID,然后在每个数据集中运行以下查询:

  1. Retrieve the node from the id 从ID检索节点
  2. Get all the node's relationships 获取所有节点的关系

I do that several times to get rid of some noise during speed measurement. 我做了几次,以消除速度测量过程中的一些噪音。 The code is : 代码是:

@Path("/test")
public class QueryTest {
    private GraphDatabaseService graphdb;

    public QueryTest (@Context GraphDatabaseService graphdb) {
        this.graphdb = graphdb;
    }

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Response test(final @QueryParam("any") List<Long> any, final @QueryParam("iter") int iter){
        JsonGenerator result = new JsonGenerator();

        result.writeStartObject();
        result.writeKeyValue("iteration", iter);
        result.writeKey("time");
        result.writeStartArray();

        ListIterator<Long> it = any.listIterator();

        long id;
        long startTime, stopTime, mean = 0;
        Node node;
        int i = 0;

        try(Transaction tx = graphdb.beginTx()) {
            while (it.hasNext()) {
                id = it.next();
                while (i++ < iter) {
                    startTime = System.nanoTime();
                    node = graphdb.findNode(Label.label("Movie"), "id", id);
                    Iterable<Relationship> t = node.getRelationships();
                    stopTime = System.nanoTime();
                    mean += (stopTime - startTime);
                }
                result.writeLong(mean / iter);
            }
            tx.success();
        }
        result.writeEndArray();
        result.writeEndObject();
        return Response.status(Status.OK).entity(result.getJson()).build();
    }
}

Where JsonGenerator is a Json creator class. 其中JsonGenerator是Json创建者类。

When accessing the database with a Get Method, it runs in approximately 0.65 to 0.7ms on the small dataset, and around 10ms on the larger dataset. 使用“获取方法”访问数据库时,在小型数据集上运行时间大约为0.65到0.7ms,在大型数据集上运行时间大约为10ms。

It seems weird to me, is it really the case that it takes 10x more time to find a node or its relationships? 对我来说似乎很奇怪,是否真的需要花费10倍的时间才能找到节点或其关系? I'm using this in a larger project on which I do not want the size of the dataset to influence performance (which is why I picked Graph-oriented database). 我在一个较大的项目中使用它,在该项目上我不希望数据集的大小影响性能(这就是为什么我选择了面向图的数据库)。 I've read in the documentation about unmanaged extensions : 我已经阅读了有关非托管扩展的文档:

This is a sharp tool, allowing users to deploy arbitrary JAX-RS classes to the server so be careful when using this. 这是一个非常有用的工具,它允许用户将任意JAX-RS类部署到服务器,因此在使用时要小心。 In particular it's easy to consume lots of heap space on the server and degrade performance. 特别是,很容易消耗服务器上的大量堆空间并降低性能。 If in doubt, please ask for help via one of the community channels. 如有疑问,请通过一种社区渠道寻求帮助。

Could it be my problem? 可能是我的问题吗? Could it be that case that by not clearing anything within the transaction I consume too much heap? 可能是由于不清除事务中的任何内容而导致我消耗了过多的堆吗? Anyone has an idea or maybe just some word about the previous quote, in particular why is it easy to consume too much heap? 任何人都对上一个报价有一个想法,或者只是一个字,特别是为什么容易消耗太多堆?

Thanks 谢谢

If you don't create an index on the label/property combination,then neo4j has to go through every single node and check its id property. 如果未在标签/属性组合上创建索引,则neo4j必须遍历每个节点并检查其id属性。 If you index it, it can go through the inverse process (knowing the id property, it can find all the corresponding nodes) which makes it way faster, and no longer dependent on database size. 如果为它建立索引,它可以经历逆过程(知道id属性,它可以找到所有对应的节点),这使其运行起来更快,并且不再依赖于数据库大小。

See this. 看到这个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM