[英]Find, group by and count relationships in Neo4J using cypher
Hi I have 2 sets of labels in neo4j 3.03:- 嗨,我在neo4j 3.03中有2组标签:-
INTERACTIONS
uidpid 100000060085836_170782808933_10154454374183934
name Dean Hohaia
postid 170782808933_10154454374183934
pageid 170782808933
userid 100000060085836
POSTS
shares 0
comments 0
postid 100129044360_100138063361365
pageid 100129044360
type link
createdtime 2010-03-30 00:43:23
pagename Study in New Zealand
likes 4
I have a relationship called LIKES which has been created likes this:- 我有一个名为LIKES的关系,它的创建方式如下:-
MATCH (i:interactions),(p:posts)
WHERE i.userid = p.userid
CREATE (i)-[:likes]->(p)
which look like this: 看起来像这样:
uidpid 613637235481924_125251397514429_1000501533322740
name Toth Mariann
postid 125251397514429_1000501533322740
pageid 125251397514429
userid 613637235481924
same as interactions basically. 与互动基本相同。
I need to find a way to create a query that shows:- 我需要找到一种方法来创建显示以下内容的查询:
for each pagename in posts, show count of userid interactions by pagename:- 对于帖子中的每个页面名称,请按页面名称显示用户名交互的计数:-
Source Pagename Matched Pagename Userids count #
Air New Zealand Rialto Channel 12494
Air New Zealand RNZ 2979
Air New Zealand SKY TV 4651
In essence - for each pagename in posts, show the count of all other pages that each user has engaged with. 本质上,对于帖子中的每个页面名称,显示每个用户参与的所有其他页面的数量。
Do I need to create any other relationships to achieve this? 我是否需要建立任何其他关系才能实现这一目标?
Here's the exact, example data I'm using as CSV's https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3 这是我用作CSV的确切示例数据https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3
First, as you mentioned we connect the interactions and the posts based on the postid
(1). 首先,正如您提到的,我们基于postid
(1)连接交互和帖子。
MATCH (i:interactions), (p:posts)
WHERE i.postid = p.postid
CREATE (i)-[:likes]->(p)
Then we create a node for each user (2): 然后,我们为每个用户创建一个节点(2个):
MATCH (i:interactions)
WITH DISTINCT i.userid AS userid
CREATE (u:user {userid: userid})
And connect them to the interactions (3): 并将它们连接到交互(3):
MATCH (u:user), (i:interactions)
WHERE u.userid = i.userid
CREATE (u)-[:performed]->(i)
It's possible to perform these two CREATE
operations (2 and 3) with a single MERGE
but the performance seems to be much worse - not sure why. 可以通过单个MERGE
执行这两个CREATE
操作(2和3),但性能似乎差得多-不知道为什么。
MATCH (i:interactions)
MERGE (u:users {userid: i.userid})-[:performed]->(i)
Having created the likes
and performed
relationships, we can now formulate the query like this (4): 创建likes
并performed
关系后,我们现在可以像下面这样编写查询(4):
MATCH (source:posts)<-[:likes]-(:interactions)<-[:performed]-(:users)-[:performed]->(:interactions)-[:likes]->(matched:posts)
RETURN source.pagename, matched.pagename, COUNT(matched)
LIMIT 10
Warning: this took two minutes to run on my laptop (late-2011 quad-core i7 CPU + SSD). 警告:这在我的笔记本电脑(2011年末四核i7 CPU + SSD)上运行需要两分钟。
The query starts from a post ( source
), and navigates through likes
and performed
edges to each user that performed the interaction. 该查询从帖子( source
)开始,并通过点likes
和performed
边导航到执行交互的每个用户。 It then navigates to those users' other interactions (again, through likes
and performed
edges), which ends in a node representing a post ( matched
). 然后,它导航到这些用户的其他交互(再次通过likes
和performed
边缘),最终以代表帖子( matched
)的节点结束。 The number of matched
nodes is aggregated with the COUNT
method and returned, along with the pagename properties. matched
节点数将通过COUNT
方法进行汇总,并与pagename属性一起返回。
A related suggestion: label names should start with an uppercase letter and should be singular, ie Post
, Interaction
and User
. 一个相关建议:标签名称应以大写字母开头,并且应为单数形式,即Post
, Interaction
和User
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.