简体   繁体   English

neo4j CYPHER-在ON MATCH SET上根据条件创建新节点

[英]neo4j CYPHER - at ON MATCH SET create new nodes on condition

To import XML data into a neo4j DB I first parse the XML to a python dictionary and then use CYPHER queries: 要将XML数据导入neo4j DB,我首先将XML解析为python字典,然后使用CYPHER查询:

WITH $pubmed_dict as pubmed_article
UNWIND pubmed_article as particle
...
FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
  MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!')})
  ON CREATE SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
  ON MATCH SET a.first_name = author.ForeName, a.affiliation = author.AffiliationInfo.Affiliation
  MERGE (p)<-[:WROTE]-(a)      
)

Unfortunately, Authors don't have unique IDs in the database, so it might be that different authors have the same last names but different initials or affiliations. 不幸的是,作者在数据库中没有唯一的ID,因此可能是不同的作者具有相同的姓氏,但名字首字母或从属关系不同。

...
                <Author ValidYN="Y">
                    <LastName>Smith</LastName>
                    <ForeName>A L</ForeName>
                    <Initials>AL</Initials>
                    <AffiliationInfo>
                        <Affiliation>University X</Affiliation>
                    </AffiliationInfo>
                </Author>
...
                <Author ValidYN="Y">
                    <LastName>Smith</LastName>
                    <ForeName>A L</ForeName>
                    <Initials>AL</Initials>
                    <AffiliationInfo>
                        <Affiliation>University BUMBABU</Affiliation>
                    </AffiliationInfo>
                </Author>

My intention was to MERGE on author.LastName but ON MATCH check if the author has the same ForeName OR the same Affiliation and if not create a new node instead. 我的意图是在author.LastName上合并,但在MATCH上检查作者是否具有相同的ForeName或相同的从属关系,如果不是,则创建一个新节点。

How would I do that using CYPHER queries? 我该如何使用CYPHER查询呢?

EDIT 1 编辑1

Node Key constraints are the solution, which is an Enterprise Edition feature, though. 节点密钥约束是解决方案,不过这是企业版的功能。 Looking for a workaround for that. 寻找一种解决方法。

EDIT 2 编辑2

This code is working almost perfectly: 这段代码几乎可以正常工作:

WITH $pubmed_dict as pubmed_article
    UNWIND pubmed_article as particle
        MERGE (p:Publication {pmid: particle.MedlineCitation.PMID.text})
        ON CREATE SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)
        ON MATCH SET p.title = COALESCE (particle.MedlineCitation.Article.Journal.Title, particle.MedlineCitation.Article.ArticleTitle)

    FOREACH (author IN particle.MedlineCitation.Article.AuthorList.Author |
      MERGE (a:Author {last_name: COALESCE(author.LastName, 'LAST NAME MISSING!'), first_name: COALESCE(author.ForeName, 'FIRST NAME MISSING!')})
      MERGE (p)<-[:WROTE]-(a)      
    )

To sum it up: For every author I want to create a new author IF LastName OR ForeName OR Affiliation are different. 总结一下:如果LastName或ForeName或从属关系不同,那么我想为每个作者创建一个新作者。 I also need NEW Nodes for authors where LAST NAME MISSING! 对于姓氏缺失的作者,我还需要新节点! and FIRST NAME MISSING! 和名字丢失!

Is it possible to achieve this result WITHOUT Key Node Constraints? 如果没有关键节点约束,是否有可能获得此结果? (because this is an Enterprise Edition feature...) (因为这是企业版功能...)

You can use constraints, then neo4j will check uniqueness for you. 您可以使用约束,然后neo4j将为您检查唯一性。

From documentation : 文档

To create a Node Key ensuring that all nodes with a particular label have a set of defined properties whose combined value is unique, and where all properties in the set are present 要创建节点密钥,请确保所有带有特定标签的节点都具有一组定义的属性,这些属性的组合值是唯一的,并且该组中的所有属性都存在

CREATE CONSTRAINT ON (author:Author)  ASSERT (author.first_name, author.last_name, author.affiliation) IS NODE KEY

The authors do have a unique ID in Neo4j, the node ID. 作者在Neo4j中确实有一个唯一的ID,即节点ID。 That can be used to identify the node and then the set the properties. 可以用来标识节点,然后设置属性。 Maybe something like this: 也许是这样的:

Match (a:Author{LastName:'xxx',ForeName:'yyy'}) 
with a, id(a) as ID
where ID > -1
match (b) where id(b)=ID set b.first_name = author.ForeName, b.affiliation = author.AffiliationInfo.Affiliation

The node's ID is not necessarily stable or predictable, so you have to access it directly before using it. 节点的ID不一定是稳定的或可预测的,因此您必须在使用它之前直接访问它。

Because you are using python code, you might to better with a global query to pull down the author node data: 因为使用的是python代码,所以最好使用全局查询来下拉作者节点数据:

match (a:Author{LastName:'xxx',ForeName:'yyy'})  return a.LastName,a.ForeName,id(a) as ID

then, you can write a csv file to bulk upload the desired info. 然后,您可以编写一个csv文件来批量上传所需的信息。 The csv could look like this: csv可能如下所示:

> "ID","ForeName","LastName","Affiliation" 
"26","David","Smith","Johns Hopkins" 
etc.

The python code could do the filtering of nodes that do not need processing. python代码可以过滤不需要处理的节点。

Then load the file: 然后加载文件:

LOAD CVS with HEADER file:///'xxx.csv' as line 
match (a) where id(a)=toInteger(line.ID) 
set a.Affiliation=line.toString(line.Affiliation")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM