简体   繁体   中英

Slow aggregation on big neo4j graph

Configuration:
Windows 8.1
neo4j-enterprise-2.2.0-M03
cache type: hpc
8Gb RAM
6Gb for JVM Heap (wrapper.java.initmemory=6144 wrapper.java.maxmemory=6144)
5Gb out of 6Gb of JVM Heap for mapped memory (dbms.pagecache.memory=5G)

Model:
Model represents how users navigate through website.
27 522 896 nodes (394Mb)
111 294 796 relationships (3609Mb)
33 906 363 properties (1326Mb)

293 (:Page) nodes
27522603 (:PageView) nodes
0 (:User) nodes (not load yet)
each (:PageView) node connected with (:Page) node
each (:PageView) node connected with next (:PageView) node
each (:PageView) node connected with (:User) node (not yet)

Query

match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;

Profile info:

+------------------------------------------------+
| p.Name                               | count   |
+------------------------------------------------+
| "#####################.aspx"         | 5172680 |
| "###############.aspx"               | 3846455 |
| "#########.aspx"                     | 3579022 |
| "###########.aspx"                   | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx"                  | 1373928 |
| "############.aspx"                  | 1338063 |
| "#####.aspx"                         | 1285447 |
| "###################.aspx"           | 884077  |
| "##############.aspx"                | 759665  |
+------------------------------------------------+
10 rows
195363 ms

Compiler CYPHER 2.2

Planner COST

Projection(0)
  |
  +Top
    |
    +EagerAggregation
      |
      +Projection(1)
        |
        +Filter(0)
          |
          +Expand(All)(0)
            |
            +Filter(1)
              |
              +Expand(All)(1)
                |
                +Filter(2)
                  |
                  +Expand(All)(2)
                    |
                    +NodeUniqueIndexSeek

+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
|            Operator | EstimatedRows |     Rows |   DbHits |                               Identifiers |                                            Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
|       Projection(0) |           881 |       10 |        0 |   FRESHID105,   FRESHID110, count, p.Name |                                    p.Name, count |
|                 Top |           881 |       10 |        0 |                  FRESHID105,   FRESHID110 |                                   {  AUTOINT1};  |
|    EagerAggregation |           881 |      173 |        0 |                  FRESHID105,   FRESHID110 |                                                  |
|       Projection(1) |        776404 | 35941815 | 71883630 |                             FRESHID105, p |                                                  |
|           Filter(0) |        776404 | 35941815 | 35941815 |                                         p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
|      Expand(All)(0) |        776404 | 35941815 | 49287436 |                                         p |                                    ()-[:At]->(p) |
|           Filter(1) |        384001 | 13345621 | 13345621 |                                           |                      hasLabel(anon[67]:PageView) |
|      Expand(All)(1) |        384001 | 13345621 | 19478500 |                                           |                                   ()-[:Next]->() |
|           Filter(2) |        189923 |  6132879 |  6132879 |                                           |                      hasLabel(anon[46]:PageView) |
|      Expand(All)(2) |        189923 |  6132879 |  6132880 |                                           |                                     ()<-[:At]-() |
| NodeUniqueIndexSeek |             1 |        1 |        1 |                                           |                                      :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+

Total database accesses: 202202762

Query without unnecessary labels

match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;

Profile info:

+------------------------------------------------+
| p.Name                               | count   |
+------------------------------------------------+
| "#####################.aspx"         | 5172680 |
| "###############.aspx"               | 3846455 |
| "#########.aspx"                     | 3579022 |
| "###########.aspx"                   | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx"                  | 1373928 |
| "############.aspx"                  | 1338063 |
| "#####.aspx"                         | 1285447 |
| "###################.aspx"           | 884077  |
| "##############.aspx"                | 759665  |
+------------------------------------------------+
10 rows
166751 ms

Compiler CYPHER 2.2

Planner COST

Projection(0)
  |
  +Top
    |
    +EagerAggregation
      |
      +Projection(1)
        |
        +Filter
          |
          +Expand(All)(0)
            |
            +Expand(All)(1)
              |
              +Expand(All)(2)
                |
                +NodeUniqueIndexSeek

+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
|            Operator | EstimatedRows |     Rows |   DbHits |                             Identifiers |                     Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
|       Projection(0) |           881 |       10 |        0 |   FRESHID82,   FRESHID87, count, p.Name |             p.Name, count |
|                 Top |           881 |       10 |        0 |                  FRESHID82,   FRESHID87 |            {  AUTOINT1};  |
|    EagerAggregation |           881 |      173 |        0 |                  FRESHID82,   FRESHID87 |                           |
|       Projection(1) |        776388 | 35941815 | 71883630 |                            FRESHID82, p |                           |
|              Filter |        776388 | 35941815 |        0 |                                       p | NOT(anon[38] == anon[60]) |
|      Expand(All)(0) |        776388 | 35941815 | 49287436 |                                       p |             ()-[:At]->(p) |
|      Expand(All)(1) |        383997 | 13345621 | 19478500 |                                         |            ()-[:Next]->() |
|      Expand(All)(2) |        189923 |  6132879 |  6132880 |                                         |              ()<-[:At]-() |
| NodeUniqueIndexSeek |             1 |        1 |        1 |                                         |               :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+

Total database accesses: 146782447

Message.log
Question
How can I perform this query much faster? (more RAM, refactor query, distributed cache, use another language/shell/method, ...)

UPD:
Profile info for last query in answer

neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name                               | count   |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx"         | 5172680 |
| "DocumentPreview.aspx"               | 3846455 |
| "Dashboard.aspx"                     | 3579022 |
| "ActualTasks.aspx"                   | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx"                  | 1373928 |
| "PaymentsList.aspx"                  | 1338063 |
| "Login.aspx"                         | 1285447 |
| "ReportingRequisites.aspx"           | 884077  |
| "ContractorInfo.aspx"                | 759665  |
+------------------------------------------------+
10 rows
151328 ms

Compiler CYPHER 2.2

Planner COST

Projection
  |
  +Top
    |
    +EagerAggregation
      |
      +Filter
        |
        +Expand(All)(0)
          |
          +Expand(All)(1)
            |
            +Expand(All)(2)
              |
              +NodeUniqueIndexSeek

+---------------------+---------------+----------+----------+------------------+---------------------------+
|            Operator | EstimatedRows |     Rows |   DbHits |      Identifiers |                     Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
|          Projection |           881 |       10 |       20 | count, p, p.Name |             p.Name, count |
|                 Top |           881 |       10 |        0 |         count, p |       {  AUTOINT1}; count |
|    EagerAggregation |           881 |      173 |        0 |         count, p |                         p |
|              Filter |        776388 | 35941815 |        0 |                p | NOT(anon[38] == anon[60]) |
|      Expand(All)(0) |        776388 | 35941815 | 49287436 |                p |             ()-[:At]->(p) |
|      Expand(All)(1) |        383997 | 13345621 | 19478500 |                  |            ()-[:Next]->() |
|      Expand(All)(2) |        189923 |  6132879 |  6132880 |                  |              ()<-[:At]-() |
| NodeUniqueIndexSeek |             1 |        1 |        1 |                  |               :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+

Total database accesses: 74898837

As I mentioned before, in your other question, if you can write a Java based server extension you can do it pretty easily.

// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());

// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();

// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
   // follow singular next relationship
   Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
   if (at2==null) continue;
   // follow singular page-view relationship to end-page
   Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
   // increment counter
   pageCounts.get(page2).incrementAndGet();
}

// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
    public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
       return - Integer.compare(e1.getValue(),e2.getValue());
    }
});
// return top 10
return pages.subList(0,10);

For Cypher I would try something like this:

match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv 
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;

Update

I wrote a test for it and ran it on my bigger linux machine, the results there are much more sensible: between 1.6s in Java and 5s max in Cypher.

Here is the code and the results: https://gist.github.com/jexp/94f75ddb849f8c41c97c

In Cypher:

-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name    | count |
+-------------------+
| "Page169" | 975   |
| "Page125" | 959   |
| "Page106" | 955   |
| "Page274" | 951   |
| "Page176" | 947   |
| "Page241" | 944   |
| "Page30"  | 942   |
| "Page44"  | 938   |
| "Page1"   | 938   |
| "Page118" | 938   |
+-------------------+
10 rows

in 3212 ms
[Compiler CYPHER 2.2

Planner COST

+---------------------+---------------+--------+--------+--------------------------+---------------------------+
|            Operator | EstimatedRows |   Rows | DbHits |              Identifiers |                     Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
|                 Top |           488 |     10 |      0 |   FRESHID71,   FRESHID76 |            {  AUTOINT1};  |
|    EagerAggregation |           488 |    300 |      0 |   FRESHID71,   FRESHID76 |                           |
|          Projection |        238460 | 264828 | 529656 |             FRESHID71, p |                           |
|              Filter |        238460 | 264828 |      0 |                        p | NOT(anon[29] == anon[51]) |
|      Expand(All)(0) |        238460 | 264828 | 529656 |                        p |             ()-[:At]->(p) |
|      Expand(All)(1) |        238460 | 264828 | 778522 |                          |            ()-[:Next]->() |
|      Expand(All)(2) |        476922 | 513694 | 513695 |                          |              ()<-[:At]-() |
| NodeUniqueIndexSeek |             1 |      1 |      1 |                          |               :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+

Total database accesses: 2351530]

And in Java:

-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938

Something you can also do to speed up your Cypher query, is to only aggregate on the nodes, and only return the page.Name property for the last 10 rows, much faster.

match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM