简体   繁体   English

如何使用neo4j cypher查询创建按关系数量划分的节点直方图?

[英]How can I use neo4j cypher query to create a histogram of nodes bucketed by number of relationships?

I have a big bunch of nodes which match the following cypher: 我有一大堆节点与下面的密码相匹配:

(:Word)<-[:Searched]-(:Session)

I want to make a histogram of the number of Word nodes at each frequency of Searched relationships. 我想在搜索关系的每个频率上制作一个Word节点数的直方图。

I want to make this sort of chart: 我想制作这种图表:

Searches Words
0        100
1-5      200
6-10     150
11-15    50
16-20    25

I'm just starting out with neo4j, and I'm not sure how to approach this or even if there is a way to specify this in cypher. 我刚刚开始使用neo4j,我不知道如何处理这个问题,或者即使有一种方法可以在cypher中指定它。 The closest I've got is to count the relationships and get averages. 我最接近的是计算关系并获得平均值。

MATCH (n:Word) 
RETURN
DISTINCT labels(n),
count(*) AS NumofNodes,
avg(size((n)<-[:Searched]-())) AS AvgNumOfRelationships,
min(size((n)<-[:Searched]-())) AS MinNumOfRelationships,
max(size((n)<-[:Searched]-())) AS MaxNumOfRelationships

That is based on an example here: https://neo4j.com/developer/kb/how-do-i-produce-an-inventory-of-statistics-on-nodes-relationships-properties/ 这是基于一个例子: https//neo4j.com/developer/kb/how-do-i-produce-an-inventory-of-statistics-on-nodes-relationships-properties/

I've also seen use of the modulus operator for grouping to get buckets, though I'm not sure how to use that in reference to the count: Neo4j cypher time interval histogram query of time tree 我也看到使用模数运算符进行分组以获取存储桶,但我不确定如何使用它来引用计数: Neo4j cypher时间间隔直方图查询时间树

Is there a "best" way to do this? 有没有“最好”的方法来做到这一点?

The following should work: 以下应该有效:

WITH 5 AS gSize
MATCH (w:Word)
OPTIONAL MATCH (w)<-[s:Searched]-()
WITH gSize, w, TOINT((COUNT(s) + (gSize-1))/gSize * gSize) AS m
RETURN
  CASE m WHEN 0 THEN '0' ELSE (m-gSize+1)+'-'+m END AS range,
  COUNT(*) AS ct
ORDER BY range;

With the sample data provided by @GaborSzarnyas, the output is: 使用@GaborSzarnyas提供的示例数据,输出为:

+-------------+
| range  | ct |
+-------------+
| "0"    | 1  |
| "1-5"  | 1  |
| "6-10" | 1  |
+-------------+

I was able to figure out a query which I think gets me the data I want: 我能够找出一个我认为能得到我想要的数据的查询:

MATCH (n:Word) 
WITH n, 5 AS bucketsize
WITH (FLOOR(SIZE( (n)<-[:Searched]-() ) / bucketsize) * bucketsize) AS numRels
RETURN numRels, COUNT(*)
ORDER BY numRels ASC

It doesn't get the zero row, which I'd like to have, but it seems like it works otherwise. 它没有得到我想拥有的零行,但似乎它起作用。 Hopefully someone else has a better solution. 希望其他人有更好的解决方案。

I created a simple example dataset of three words: w1 with no searches, w2 with 3 searches and w3 with 6. 我创建了一个简单的三个单词示例数据集: w1没有搜索, w2有3次搜索, w3有6次。

CREATE (w1:Word {w: '1'})
WITH count(*) AS dummy

CREATE (w2:Word {w: '2'}) WITH w2
UNWIND range(1, 3) AS i
CREATE (w2)<-[:Searched]-(:Session)
WITH count(*) AS dummy

CREATE (w3:Word {w: '3'}) WITH w3
UNWIND range(1, 6) AS i
CREATE (w3)<-[:Searched]-(:Session)

I would approach it like this: first, let's create a list with the upper limits for each bucket: 我会这样做:首先,让我们创建一个列表,其中包含每个桶的上限:

RETURN [i IN range(0, 4) | i*5] AS upperLimits

╒══════════════╕
│"upperLimits" │
╞══════════════╡
│[0,5,10,15,20]│
└──────────────┘

Second, use this with a list comprehension that selects the elements from the list that has a sufficiently large upper limit. 其次,将此与列表推导一起使用,从列表中选择具有足够大上限的元素。 The first one of these is our bucket, so we select that with the [0] list indexer. 第一个是我们的存储桶,因此我们使用[0]列表索引器选择它。 The rest is just calculating the lower limit and ordering rows: 其余的只是计算下限和排序行:

WITH [i IN range(0, 4) | i*5] AS upperLimits
MATCH (n:Word) 
WITH upperLimits, ID(n) AS n, size((n)<-[:Searched]-()) AS numOfRelationships
WITH
  [upperLimit IN upperLimits WHERE numOfRelationships <= upperLimit][0] AS upperLimit,
  count(n) AS count
RETURN
  upperLimit - 4 AS lowerLimit,
  upperLimit,
  count
ORDER BY lowerLimit

The query gives the following results: 该查询提供以下结果:

╒════════════╤════════════╤═══════╕
│"lowerLimit"│"upperLimit"│"count"│
╞════════════╪════════════╪═══════╡
│-4          │0           │1      │
├────────────┼────────────┼───────┤
│1           │5           │1      │
├────────────┼────────────┼───────┤
│6           │10          │1      │
└────────────┴────────────┴───────┘

Potential improvements: 潜在改进:

(1) If the value of numOfRelationships is larger than the largest upper limit, the query above will return the first element of an empty list, which is null . (1)如果numOfRelationships的值大于最大上限,则上面的查询将返回空列表的第一个元素,即null To avoid that, either 1) set a sufficiently large upper limit, eg 为避免这种情况,1)设置足够大的上限,例如

MATCH (n:Word) 
WITH max(size((n)<-[:Searched]-())) AS maxNumberOfRelationShips
WITH [i IN range(-1, maxNumberOfRelationShips/5+1) | {lower: i*5-4, upper: i*5}] AS limits
RETURN *

You can use the top bucket with "16 or larger" semantics with coalesce . 您可以使用带有“16或更大”语义的顶级存储区和coalesce

(2) -4 as a lower limit is not very nice, we can use CASE to get rid of it. (2) -4作为下限不是很好,我们可以使用CASE来摆脱它。

Putting all this together, we get this: 把所有这些放在一起,我们得到这个:

MATCH (n:Word) 
WITH max(size((n)<-[:Searched]-())) AS maxNumberOfRelationShips
WITH [i IN range(0, maxNumberOfRelationShips/5+1) | i*5] AS upperLimits
MATCH (n:Word) 
WITH upperLimits, ID(n) AS n, size((n)<-[:Searched]-()) AS numOfRelationships
WITH
  [upperLimit IN upperLimits WHERE numOfRelationships <= upperLimit][0] AS upperLimit,
  count(n) AS count
RETURN 
  CASE WHEN upperLimit - 4 < 0 THEN 0 ELSE upperLimit - 4 END AS lowerLimit,
  upperLimit,
  count
ORDER BY lowerLimit

Which results in: 结果如下:

╒════════════╤════════════╤═══════╕
│"lowerLimit"│"upperLimit"│"count"│
╞════════════╪════════════╪═══════╡
│0           │0           │1      │
├────────────┼────────────┼───────┤
│1           │5           │1      │
├────────────┼────────────┼───────┤
│6           │10          │1      │
└────────────┴────────────┴───────┘

What I usually do in this scenario is that I use the setting in neo4j that if you divide an integer by an integer you get back an integer. 我在这种情况下通常做的是我使用neo4j中的设置,如果你将整数除以整数,你会得到一个整数。 This simplifies the query alot. 这简化了查询。 We add a special case for 0 and it all fits in one line. 我们为0添加一个特殊情况,它们都适合一行。

WITH [0,1,5,7,9,11] as list
UNWIND list as x
WITH CASE WHEN x = 0 THEN -1 ELSE  (x / 5) * 5 END as results
return results

This returns 这回来了

-1, 0, 5, 5, 5, 10 -1,0,5,5,5,10

Which is not ideal given that you want to group 1-5 together but good enough i guess. 考虑到你想要将1-5组合在一起,这是不理想的,但我认为足够好。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从取决于节点类型的 csv 在 neo4j/Cypher 中创建关系? - How can I create relationships in neo4j/Cypher from a csv that depend on the type of nodes? 如何使用 neo4j Cypher 查询创建直方图 output? - How to create histogram output using neo4j Cypher query? 如何在使用 APOC 的 Neo4j 中的虚拟节点和关系上运行 cypher 查询? - How do I run a cypher query on virtual nodes and relationships in Neo4j ceated using APOC? 如何使用 Spring 数据 Neo4j 创建一个与现有节点具有新关系的节点 - How can I use Spring Data Neo4j to create a Node with new Relationships to existing Nodes 如何在Cypher查询语言(Neo4J)中正确使用IF或CASE等条件来成功创建关系? - How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships? neo4j cypher交织节点和关系 - neo4j cypher interleave nodes and relationships 如何使用Cypher在neo4j中“组合”两个节点和关系 - How to “combine” two nodes and relationships in neo4j using Cypher 如何在Neo4j中对两个节点之间的关系数量创建约束 - How do I create a constraint on the number of relationships between two nodes in Neo4j Neo4j Cypher to C#如何基于参数值与其他节点创建不同的关系 - Neo4j Cypher to C# how to create different relationships with other nodes based on the value of a parameter Neo4j是否可以在一个密码查询中创建可变数量的关系? - Neo4j is there a way to create a variable number of relationships in one cypher query?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM