How can I call this APOC procedure selectively? (only on a subset of nodes)

Question

I have a Neo4J database with a number of nodes of label com. These nodes contain a key property - which uniquely groups them in the fashion that I want. They also have a timestamp property, as well as a number of other integer properties.

Here's the issue I'm facing: I want to use the APOC graph grouping procedure to aggregate these nodes together, based on their key properties. However, I want to do so selectively - such that I only aggregate the nodes if their timestamp property meets a provided time window.

I have tried to MATCH and filter the nodes with a WHERE clause based on their timestamp, but I am unable to specifically pass those nodes to the nodes.group procedure. Basically, I need to figure out how to CALL nodes.group only on a specific subset of nodes. I'd appreciate any help.

Here is the CALL I'm performing:

CALL apoc.nodes.group(['com'], ['key'], [{val1: 'sum', val2: 'sum', val3: 'sum',' time_start: 'collect'}]) YIELD node

As I mentioned above, I tried performing a

MATCH (c:com) WHERE c.time_start >= datetime('2020-12-16T21:45:05Z')

...prior to the procedure and then chaining queries, but it did not work.

The procedure still got called on ALL nodes of com relationship, not just the ones I filtered.

Answer 1

The procedure itself does not allow you to pass such filters. There are however two possibilities to circumvent this:

build the virtual graph yourself with vNode and vRelationship
set a temporary label after your node selection and group on that

I will focus on option 2:

Take the following graph as an example:

UNWIND range(1, 200) AS i
CREATE (n:com)
SET n.timestamp = i, 
n.key = apoc.coll.randomItem(items)

And let's say I have an hypothetical window to use that is 30 to 70 , I can find only the nodes matching my window predicate:

WITH [30, 70] AS window
MATCH (n:com) 
WHERE n.timestamp > window[0] 
AND n.timestamp < window[1]
RETURN count(n)

╒══════════╕
│"count(n)"│
╞══════════╡
│39        │
└──────────┘

Before jumping in the grouping query, I just want to show that you can set a label and remove it in the same query, using the predicate.

WITH [30, 70] AS window
MATCH (n:com) 
WHERE n.timestamp > window[0] 
AND n.timestamp < window[1]
SET n:temporary
WITH count(n) AS doSomething
MATCH (n:temporary)
REMOVE n:temporary
WITH count(*) AS break, doSomething
RETURN doSomething

The last WITH count(*) is necessary to not return one row per temporary node.

Now, using this logic, we can:

MATCH nodes using the window predicate
Assign them a new temporary label
Use apoc.nodes.group on the temporary label instead
Remove the temporary label
Return the grouped nodes

WITH [30, 70] AS window
MATCH (n:com) WHERE n.timestamp > window[0] AND n.timestamp < window[1]
SET n:temporary
WITH window, count(*) AS x
CALL apoc.nodes.group(['temporary'], ['key'], null, {})
YIELD node, relationship
WITH collect(node) AS elements
MATCH (n:temporary) REMOVE n:temporary
WITH count(*) AS break, elements
UNWIND elements AS element
RETURN element

╒════════════════════════╕
│"element"               │
╞════════════════════════╡
│{"count_*":6,"key":"f"} │
├────────────────────────┤
│{"count_*":6,"key":"e"} │
├────────────────────────┤
│{"count_*":12,"key":"d"}│
├────────────────────────┤
│{"count_*":1,"key":"c"} │
├────────────────────────┤
│{"count_*":5,"key":"b"} │
├────────────────────────┤
│{"count_*":9,"key":"a"} │
└────────────────────────┘

How can I call this APOC procedure selectively? (only on a subset of nodes)

Question

1 answers

solution1
0 ACCPTED 2020-12-19 18:41:11

How can I call this APOC procedure selectively? (only on a subset of nodes)

Question

1 answers

solution1 0 ACCPTED 2020-12-19 18:41:11

solution1
0 ACCPTED 2020-12-19 18:41:11