How to optimize a Neo4J Cypher query?

Question

I have an app that converts text into network, so that when a sentence is added every word is a node and every co-occurrence of words is the connection between them. This information is important to better understand the question below.

In order to add every sentence into the Neo4J database, I have the following Cypher query in Neo4J, which, according to my data structure, first matches the user who's adding the nodes, then matches the context (or list) where the statement is made, links it to the user, links the statement to the user and to the context, and then creates connections between every node added (with properties), the statement, where they were made and the context (list) in which they were made.

The problem is that this query is about 100 longer than the sentence itself, so if a text is 400Bytes, the query is about 40K. When I want to add a long text, then Neo4J starts to be very slow.

Therefore — my question: how would I optimize this query in the best way? Do you recommend to make a set of transactions instead?

Can I, for example, cut each long query into many parts and then send a few transactions concurrently to save time?

I'm talking about a text which would be about 100K long, maybe longer. So that means that the total request would be about 10Mb long.

MATCH (u:User {uid: "6e228580-1cb3-11e8-8271-891867c15336"}) 
MERGE (c_list:Context {name:"list",by:"6e228580-1cb3-11e8-8271-891867c15336",
uid:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47"}) 
ON CREATE SET c_list.timestamp="15199833288930000" 
MERGE (c_list)-[:BY{timestamp:"15199833288930000"}]->(u) 
CREATE (s:Statement {name:"#apple #orange #fruit", 
text:"apples and oranges are fruits", 
uid:"0b56a800-1dfd-11e8-802e-b5cbdf950c47", timestamp:"15199833288930000"}) 
CREATE (s)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930000"}]->(u) 
CREATE (s)-[:IN {user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000"}]->(c_list) 
MERGE (cc_apple:Concept {name:"apple"}) 
ON CREATE SET cc_apple.timestamp="15199833288930000", cc_apple.uid="0b56a801-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_orange:Concept {name:"orange"}) 
ON CREATE SET cc_orange.timestamp="15199833288930000", cc_orange.uid="0b56cf10-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_fruit:Concept {name:"fruit"}) 
ON CREATE SET cc_fruit.timestamp="15199833288930002", cc_fruit.uid="0b56cf13-1dfd-11e8-802e-b5cbdf950c47" 
CREATE (cc_apple)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_apple)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"}]->(s)  
CREATE (cc_apple)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list) 
CREATE (cc_apple)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000",uid:"0b56cf11-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"}]->(cc_orange) 
CREATE (cc_orange)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_orange)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"}]->(s) 
CREATE (cc_orange)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list) 
CREATE (cc_orange)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf14-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"}]->(cc_fruit) 
CREATE (cc_apple)-[:TO {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf16-1dfd-11e8-802e-b5cbdf950c47",gapscan:"4",weight:"2"}]->(cc_fruit) 
CREATE (cc_fruit)-[:BY {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930002",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(u) 
CREATE (cc_fruit)-[:OF {context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930002"}]->(s) 
CREATE (cc_fruit)-[:AT {user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"}]->(c_list)  
RETURN s.uid;

Answer 1

1) Use the input parameters :

var params = {
    userId: "6e228580-1cb3-11e8-8271-891867c15336",
    contextName: "list",
    time: "15199833288930000",
    statementName: "#apple #orange #fruit",
    statementText: "apples and oranges are fruits",
    concepts: ["apple", "orange", "fruit"],
    conceptsRelations: [
        {from: "apple",  to: "orange", gapscan: 2, weight: 3},
        {from: "orange", to: "fruit",  gapscan: 2, weight: 3},
        {from: "apple",  to: "fruit",  gapscan: 4, weight: 2}
    ]
}
session.run(cypherQuery, params).then...

2) Use the APOC library to generate unique identifiers on the database side: apoc.create.uuid()

3) Use cycles (foreach and unwind) for repetitive operations:

MATCH (u:User {uid: $userId})
MERGE (c_list:Context {name: $contextName, by: $userId})
    ON CREATE SET c_list.timestamp = $time,
                  c_list.uid = apoc.create.uuid()
MERGE (c_list)-[:BY{timestamp: $time}]->(u)

CREATE (s:Statement {name: $statementName, 
                     text: $statementText, uid:apoc.create.uuid(), timestamp: $time})
CREATE (s)-[:BY {context: c_list.uid, timestamp: $time}]->(u)
CREATE (s)-[:IN {user: u.uid, timestamp: $time}]->(c_list)

FOREACH (conceptName in $concepts|
    MERGE (concept:Concept {name: conceptName})
        ON CREATE SET concept.timestamp = $time,
                      concept.uid = apoc.create.uuid()
    CREATE (concept)-[:BY {context: c_list.uid, timestamp: $time, statement: s.uid}]->(u)
    CREATE (concept)-[:OF {context: c_list.uid, user: u.uid, timestamp: $time}]->(s)
    CREATE (concept)-[:AT {user: u.uid, timestamp: $time, 
                           context: c_list.uid, statement: s.uid}]->(c_list)
)

WITH u, c_list, s

UNWIND $conceptsRelations as conceptsRelation
  MATCH (c_from:Concept {name: conceptsRelation.from})
  MATCH (c_to:Concept {name: conceptsRelation.to})
  CREATE (c_from)-[:TO {context: c_list.uid, statement: s.uid, user: u.uid,
                        timestamp: $time, uid: apoc.create.uuid(), 
                        gapscan: conceptsRelation.gapscan, 
                        weight: conceptsRelation.weight}]->(c_to)
RETURN distinct s.uid;

How to optimize a Neo4J Cypher query?

Question

1 answers

solution1
1 2018-03-02 12:04:11

How to optimize a Neo4J Cypher query?

Question

1 answers

solution1 1 2018-03-02 12:04:11

solution1
1 2018-03-02 12:04:11