简体   繁体   中英

Neo4j Cypher query for finding nodes based on characteristics deltas

In my Neo4j/Spring Data Neo4j 4 project I have an entities: Product

every Product has an Integer property - price

For example I have a following products with prices:

Product1.price = 100
Product2.price = 305
Product3.price = 10000
Product4.price = 1000
Product5.price = 220

Products are not connected between each other with a relationships.

I need based on initial price value(Cypher query parameter) find a set(path) of products that distinct each other by a maximum price delta(Cypher query parameter).

For example I need to find all products in Neo4j database starting from price = 50 and price delta = 150. As an output I expect to get the following products:

Product1.price = 100
Product5.price = 220
Product2.price = 305

The computation looks like:

Starting point price = 50 so the first product should have a price not less than 50 and not more that 200(50+150). So based on this we found a product from our catalog with a price = 100. The second product should have a price not less than 100 and not more than 250(100+150).. This a product with a price = 220.. and the third price not less than 220 and not more 370. This is a product with a price = 305

Could you please show a Cypher query that will find such kind of products.

This is rather complex to perform in Cypher. The only approach that comes to me is to use the REDUCE() function along with a CASE statement to conditionally add the product to the end of the list if it is within the delta of price of the last product in the list.

Keep in mind that there is no way to short-circuit the processing of products with this approach. If there are 1 million total products, and we find in the ordered list of products that only the first two products are within that delta pattern, this query will continue to check every single remaining one of those million products, although none of them will be added to our list.

This query should work for you.

WITH {startPrice:50, delta:150} as params
MATCH (p:Product)
WHERE p.price >= params.startPrice
WITH params, p
ORDER BY p.price asc
WITH params, COLLECT(p) as products
WITH params, TAIL(products) as products, HEAD(products) as first
WHERE first.price <= params.startPrice + params.delta
WITH REDUCE(prods = [first], prod in products | 
  CASE WHEN prod.price <= LAST(prods).price + params.delta 
       THEN prods + prod 
       ELSE prods END) as products
RETURN products

The solution requires the transfer of an intermediate result during iteration. An interesting problem, because today cypher does not offer this possibility directly. As an exercise (sketch) use the apoc.periodic.commit procedure from APOC -library:

CALL apoc.create.uuid() YIELD uuid
CALL apoc.periodic.commit("
  MERGE (H:tmpVars {id: {tmpId}})
  ON CREATE SET H.prices = [],
                H.lastPrice = {lastPrice}, 
                H.delta = {delta} 
  WITH H
  MATCH (P:Product) WHERE P.price > H.lastPrice AND 
                          P.price < H.lastPrice + H.delta
  WITH H, max(P.price) as lastPrice
  SET H.lastPrice = lastPrice, 
      H.prices = H.prices + lastPrice
  RETURN 1
  ", {tmpId: uuid, delta: 150, lastPrice: 50}
) YIELD updates, executions, runtime
MATCH (T:tmpVars {id: uuid}) 
WITH T, T.prices as prices DETACH DELETE T
WITH prices 
UNWIND prices as price
MATCH (P:Product) WHERE P.price = price
RETURN P ORDER BY P.price ASC

As an alternate solution which should be much faster to query, but requires more maintenance and care to keep working properly (especially with rapidly changing product price data), you can create relationships between your Product nodes in ascending price order, and keep the deltas as relationship properties.

Here's how you might create this using APOC Procedures:

MATCH (p:Product)
WITH p 
ORDER BY p.price ASC
WITH apoc.coll.pairsMin(COLLECT(p)) as products
UNWIND products as prodPairs
WITH prodPairs[0] as prod1, prodPairs[1] as prod2
CREATE (prod1)-[r:NextProd]->(prod2)
SET r.delta = prod2.price - prod1.price

And here's how you might query this once it's set up.

WITH {startPrice:50, delta:150} as params
WITH params, params.startPrice + params.delta as ceiling
MATCH (start:Product)
WHERE params.startPrice <= start.price <= ceiling
WITH start, params
ORDER BY start.price ASC
LIMIT 1
MATCH (start)-[r:NextProd*0..]->(product:Product)
WHERE ALL(rel in r WHERE rel.delta <= params.delta)
RETURN DISTINCT product

This should be a fairly fast query, as the ALL() predicate should cut off the variable match when it reaches a relationship that exceeds the desired delta.

The downside, of course, is that you'll need to make sure every operation that will impact this linked list structure (adding or removing products and changing product prices) properly adjusts the structure, and you might need to consider locking approaches to ensure threadsafety so you don't mangle the linked list if products and/or prices update concurrently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM