简体   繁体   English

如何优化Neo4J Cypher查询?

[英]How to optimize a Neo4J Cypher query?

I have an app that converts text into network, so that when a sentence is added every word is a node and every co-occurrence of words is the connection between them. 我有一个将文本转换为网络的应用程序,这样,在添加句子时,每个单词都是一个节点,单词的每个共现是它们之间的连接。 This information is important to better understand the question below. 此信息对于更好地理解以下问题很重要。

In order to add every sentence into the Neo4J database, I have the following Cypher query in Neo4J, which, according to my data structure, first matches the user who's adding the nodes, then matches the context (or list) where the statement is made, links it to the user, links the statement to the user and to the context, and then creates connections between every node added (with properties), the statement, where they were made and the context (list) in which they were made. 为了将每个句子添加到Neo4J数据库中,我在Neo4J中使用以下Cypher查询,根据我的数据结构,该查询首先与添加节点的user匹配,然后与执行语句的context (或列表)匹配,将其链接到用户,将语句链接到用户和上下文,然后在添加的每个节点(带有属性),语句,创建它们的位置以及上下文(列表)之间创建连接。

The problem is that this query is about 100 longer than the sentence itself, so if a text is 400Bytes, the query is about 40K. 问题是此查询比句子本身长100左右,因此,如果文本为400Bytes,则查询约为40K。 When I want to add a long text, then Neo4J starts to be very slow. 当我要添加长文本时,Neo4J开始非常慢。

Therefore — my question: how would I optimize this query in the best way? 因此,我的问题是:如何以最佳方式优化此查询? Do you recommend to make a set of transactions instead? 您是否建议进行一组交易?

Can I, for example, cut each long query into many parts and then send a few transactions concurrently to save time? 例如,我可以将每个较长的查询分为多个部分,然后同时发送一些事务以节省时间吗?

I'm talking about a text which would be about 100K long, maybe longer. 我说的是大约100K甚至更长的文本。 So that means that the total request would be about 10Mb long. 因此,这意味着总请求量约为10Mb。

MATCH (u:User {uid: "6e228580-1cb3-11e8-8271-891867c15336"}) 
MERGE (c_list:Context {name:"list",by:"6e228580-1cb3-11e8-8271-891867c15336",
uid:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47"}) 
ON CREATE SET c_list.timestamp="15199833288930000" 
MERGE (c_list)-[:BY{timestamp:"15199833288930000"}]->(u) 
CREATE (s:Statement {name:"#apple #orange #fruit", 
text:"apples and oranges are fruits", 
uid:"0b56a800-1dfd-11e8-802e-b5cbdf950c47", timestamp:"15199833288930000"}) 
CREATE (s)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930000"}]->(u) 
CREATE (s)-[:IN {user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000"}]->(c_list) 
MERGE (cc_apple:Concept {name:"apple"}) 
ON CREATE SET cc_apple.timestamp="15199833288930000", cc_apple.uid="0b56a801-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_orange:Concept {name:"orange"}) 
ON CREATE SET cc_orange.timestamp="15199833288930000", cc_orange.uid="0b56cf10-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_fruit:Concept {name:"fruit"}) 
ON CREATE SET cc_fruit.timestamp="15199833288930002", cc_fruit.uid="0b56cf13-1dfd-11e8-802e-b5cbdf950c47" 
CREATE (cc_apple)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_apple)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"}]->(s)  
CREATE (cc_apple)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list) 
CREATE (cc_apple)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000",uid:"0b56cf11-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"}]->(cc_orange) 
CREATE (cc_orange)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_orange)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"}]->(s) 
CREATE (cc_orange)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list) 
CREATE (cc_orange)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf14-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"}]->(cc_fruit) 
CREATE (cc_apple)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf16-1dfd-11e8-802e-b5cbdf950c47",gapscan:"4",weight:"2"}]->(cc_fruit) 
CREATE (cc_fruit)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930002",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_fruit)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930002"}]->(s) 
CREATE (cc_fruit)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list)  
RETURN s.uid;

1) Use the input parameters : 1)使用输入parameters

var params = {
    userId: "6e228580-1cb3-11e8-8271-891867c15336",
    contextName: "list",
    time: "15199833288930000",
    statementName: "#apple #orange #fruit",
    statementText: "apples and oranges are fruits",
    concepts: ["apple", "orange", "fruit"],
    conceptsRelations: [
        {from: "apple",  to: "orange", gapscan: 2, weight: 3},
        {from: "orange", to: "fruit",  gapscan: 2, weight: 3},
        {from: "apple",  to: "fruit",  gapscan: 4, weight: 2}
    ]
}
session.run(cypherQuery, params).then...

2) Use the APOC library to generate unique identifiers on the database side: apoc.create.uuid() 2)使用APOC library在数据库端生成唯一标识符: apoc.create.uuid()

3) Use cycles (foreach and unwind) for repetitive operations: 3)使用循环(foreach和unwind)进行重复操作:

MATCH (u:User {uid: $userId})
MERGE (c_list:Context {name: $contextName, by: $userId})
    ON CREATE SET c_list.timestamp = $time,
                  c_list.uid = apoc.create.uuid()
MERGE (c_list)-[:BY{timestamp: $time}]->(u)

CREATE (s:Statement {name: $statementName, 
                     text: $statementText, uid:apoc.create.uuid(), timestamp: $time})
CREATE (s)-[:BY {context: c_list.uid, timestamp: $time}]->(u)
CREATE (s)-[:IN {user: u.uid, timestamp: $time}]->(c_list)

FOREACH (conceptName in $concepts|
    MERGE (concept:Concept {name: conceptName})
        ON CREATE SET concept.timestamp = $time,
                      concept.uid = apoc.create.uuid()
    CREATE (concept)-[:BY {context: c_list.uid, timestamp: $time, statement: s.uid}]->(u)
    CREATE (concept)-[:OF {context: c_list.uid, user: u.uid, timestamp: $time}]->(s)
    CREATE (concept)-[:AT {user: u.uid, timestamp: $time, 
                           context: c_list.uid, statement: s.uid}]->(c_list)
)

WITH u, c_list, s

UNWIND $conceptsRelations as conceptsRelation
  MATCH (c_from:Concept {name: conceptsRelation.from})
  MATCH (c_to:Concept {name: conceptsRelation.to})
  CREATE (c_from)-[:TO {context: c_list.uid, statement: s.uid, user: u.uid,
                        timestamp: $time, uid: apoc.create.uuid(), 
                        gapscan: conceptsRelation.gapscan, 
                        weight: conceptsRelation.weight}]->(c_to)
RETURN distinct s.uid;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM