[英]Loading data in Neo4j from csv
I'm uploading iteratively nodes and edges from several csv files, one per node type. 我正在迭代地从多个csv文件上传节点和边,每种节点类型一个。 Loading nodes works well, but loading edges does not always work.
加载节点效果很好,但是加载边缘并不总是有效。 Indeed, some of the nodes have numbers as identifiers - if so, all nodes of that type have numbers - but are loaded as strings, hence creating edges fail.
实际上,某些节点具有数字作为标识符-如果是这样,则该类型的所有节点都具有数字-但会作为字符串加载,因此创建边会失败。 Creating manually an edge adding wrapping the identifier with double quotes works well.
手动创建一条边,并添加用双引号引起来的标识符,效果很好。
How can I either force LOAD CSV to use numbers for these identifiers while creating nodes, or force LOAD CSV to wrap identifiers with double quotes? 如何在创建节点时强制LOAD CSV对这些标识符使用数字,或者强制LOAD CSV将标识符用双引号引起来?
Article nodes: 文章节点:
Type PMID ArticleTitle AbstractText Date Pages
Article 25358116 Synthesis of... Abstract 2014-10-30
Article 25358093 Putting theory... In this study... 2014-10-30 e1003910
Issue nodes: 发行节点:
Type Name Year Month Volume Issue
Issue J. Med. Chem., 2014 2014 Oct
Issue PLoS Comput. Biol., 2014, 10, 10 2014 Oct 10 10
Issue PLoS ONE, 2014, 9, 10 2014 9 10
Edges: 边缘:
Name PMID
J. Med. Chem., 2014 25358116
PLoS Comput. Biol., 2014, 10, 10 25358093
Cypher commands: 密码命令:
CREATE INDEX ON :Article(PMID);
LOAD CSV WITH HEADERS FROM 'article.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Article { PMID: toInt(csvLine.PMID), Title: csvLine.ArticleTitle, Date: csvLine.Date, Pages: csvLine.Pages, AbstractText: csvLine.Abstract }) return count(*);
CREATE INDEX ON :Journal(Abbreviate);
CREATE INDEX ON :Issue(Name);
LOAD CSV WITH HEADERS FROM 'issue.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Issue { Name: csvLine.Name, Volume: csvLine.Volume, Issue: csvLine.Issue, Year: csvLine.Year, Month: csvLine.Month}) return count(*);
LOAD CSV WITH HEADERS FROM 'article.edges' as csvLine FIELDTERMINATOR '\t' MATCH (src:Issue { Name: csvLine.Name }), (tgt:Article { PMID: toInt(csvLine.PMID) }) CREATE (src) -[:hasArticle]-> (tgt) return count(*);
You can use toInt(csvline.id) for eg : 您可以将toInt(csvline.id)用于:
http://neo4j.com/docs/stable/query-functions-scalar.html#functions-toint http://neo4j.com/docs/stable/query-functions-scalar.html#functions-toint
After receiving your files, there was a formatting problem with the TSV. 收到文件后,TSV出现格式问题。
Showing the csvLine at a whole in neo4j showed me null for Issue name, so I modified the format and reexport the file with google drive. 在neo4j中整体显示csvLine时,问题名称为null,因此我修改了格式并使用Google驱动器重新导出了文件。
Also you can check for errors on CsvLINT http://csvlint.io/validation/545681456373761303020000 您也可以在CsvLINT上检查错误http://csvlint.io/validation/545681456373761303020000
LOAD CSV WITH HEADERS FROM 'file:///Users/ikwattro/dev/playbox/pierre/article.edges' as csvLine FIELDTERMINATOR '\t' WITH csvLine LIMIT 10 RETURN csvLine
I get this
Name PMID J. Med. Chem., 2014 25358116
Name PMID PLoS Comput. Biol., 2014, 10, 10
Name PMID J. Med. Chem., 2014
Name PMID J. Med. Chem., 2014
Name And PMID Are under the same key
Chris 克里斯
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.