简体   繁体   English

使用密码在Neo4J中查找,分组和计数关系

[英]Find, group by and count relationships in Neo4J using cypher

Hi I have 2 sets of labels in neo4j 3.03:- 嗨,我在neo4j 3.03中有2组标签:-

INTERACTIONS

uidpid  100000060085836_170782808933_10154454374183934
name    Dean Hohaia
postid  170782808933_10154454374183934
pageid  170782808933
userid  100000060085836

POSTS

shares      0
comments    0
postid      100129044360_100138063361365
pageid      100129044360
type        link
createdtime 2010-03-30 00:43:23
pagename    Study in New Zealand
likes       4

I have a relationship called LIKES which has been created likes this:- 我有一个名为LIKES的关系,它的创建方式如下:-

MATCH (i:interactions),(p:posts)
WHERE i.userid = p.userid
CREATE (i)-[:likes]->(p)

which look like this: 看起来像这样:

uidpid  613637235481924_125251397514429_1000501533322740
name    Toth Mariann
postid  125251397514429_1000501533322740
pageid  125251397514429
userid  613637235481924

same as interactions basically. 与互动基本相同。

I need to find a way to create a query that shows:- 我需要找到一种方法来创建显示以下内容的查询:

for each pagename in posts, show count of userid interactions by pagename:- 对于帖子中的每个页面名称,请按页面名称显示用户名交互的计数:-

Source Pagename  Matched Pagename   Userids count #
Air New Zealand  Rialto Channel     12494
Air New Zealand  RNZ                2979
Air New Zealand  SKY TV             4651

In essence - for each pagename in posts, show the count of all other pages that each user has engaged with. 本质上,对于帖子中的每个页面名称,显示每个用户参与的所有其他页面的数量。

Do I need to create any other relationships to achieve this? 我是否需要建立任何其他关系才能实现这一目标?

Here's the exact, example data I'm using as CSV's https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3 这是我用作CSV的确切示例数据https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3

First, as you mentioned we connect the interactions and the posts based on the postid (1). 首先,正如您提到的,我们基于postid (1)连接交互和帖子。

MATCH (i:interactions), (p:posts)
WHERE i.postid = p.postid
CREATE (i)-[:likes]->(p)

Then we create a node for each user (2): 然后,我们为每个用户创建一个节点(2个):

MATCH (i:interactions)
WITH DISTINCT i.userid AS userid
CREATE (u:user {userid: userid})

And connect them to the interactions (3): 并将它们连接到交互(3):

MATCH (u:user), (i:interactions)
WHERE u.userid = i.userid
CREATE (u)-[:performed]->(i)

It's possible to perform these two CREATE operations (2 and 3) with a single MERGE but the performance seems to be much worse - not sure why. 可以通过单个MERGE执行这两个CREATE操作(2和3),但性能似乎差得多-不知道为什么。

MATCH (i:interactions)
MERGE (u:users {userid: i.userid})-[:performed]->(i)

Having created the likes and performed relationships, we can now formulate the query like this (4): 创建likesperformed关系后,我们现在可以像下面这样编写查询(4):

MATCH (source:posts)<-[:likes]-(:interactions)<-[:performed]-(:users)-[:performed]->(:interactions)-[:likes]->(matched:posts)
RETURN source.pagename, matched.pagename, COUNT(matched)
LIMIT 10

Warning: this took two minutes to run on my laptop (late-2011 quad-core i7 CPU + SSD). 警告:这在我的笔记本电脑(2011年末四核i7 CPU + SSD)上运行需要两分钟。

The query starts from a post ( source ), and navigates through likes and performed edges to each user that performed the interaction. 该查询从帖子( source )开始,并通过点likesperformed边导航到执行交互的每个用户。 It then navigates to those users' other interactions (again, through likes and performed edges), which ends in a node representing a post ( matched ). 然后,它导航到这些用户的其他交互(再次通过likesperformed边缘),最终以代表帖子( matched )的节点结束。 The number of matched nodes is aggregated with the COUNT method and returned, along with the pagename properties. matched节点数将通过COUNT方法进行汇总,并与pagename属性一起返回。

A related suggestion: label names should start with an uppercase letter and should be singular, ie Post , Interaction and User . 一个相关建议:标签名称应以大写字母开头,并且应为单数形式,即PostInteractionUser

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM