简体   繁体   English

从CSV加载Neo4j中的数据

[英]Loading data in Neo4j from csv

I'm uploading iteratively nodes and edges from several csv files, one per node type. 我正在迭代地从多个csv文件上传节点和边,每种节点类型一个。 Loading nodes works well, but loading edges does not always work. 加载节点效果很好,但是加载边缘并不总是有效。 Indeed, some of the nodes have numbers as identifiers - if so, all nodes of that type have numbers - but are loaded as strings, hence creating edges fail. 实际上,某些节点具有数字作为标识符-如果是这样,则该类型的所有节点都具有数字-但会作为字符串加载,因此创建边会失败。 Creating manually an edge adding wrapping the identifier with double quotes works well. 手动创建一条边,并添加用双引号引起来的标识符,效果很好。

How can I either force LOAD CSV to use numbers for these identifiers while creating nodes, or force LOAD CSV to wrap identifiers with double quotes? 如何在创建节点时强制LOAD CSV对这些标识符使用数字,或者强制LOAD CSV将标识符用双引号引起来?

Article nodes: 文章节点:

Type    PMID    ArticleTitle    AbstractText    Date    Pages
Article 25358116    Synthesis of... Abstract    2014-10-30  
Article 25358093    Putting theory...   In this study...    2014-10-30  e1003910

Issue nodes: 发行节点:

Type    Name    Year    Month   Volume  Issue
Issue   J. Med. Chem., 2014 2014    Oct     
Issue   PLoS Comput. Biol., 2014, 10, 10    2014    Oct 10  10
Issue   PLoS ONE, 2014, 9, 10   2014        9   10

Edges: 边缘:

Name    PMID
J. Med. Chem., 2014 25358116
PLoS Comput. Biol., 2014, 10, 10    25358093

Cypher commands: 密码命令:

CREATE INDEX ON :Article(PMID);
LOAD CSV WITH HEADERS FROM 'article.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Article { PMID: toInt(csvLine.PMID), Title: csvLine.ArticleTitle, Date: csvLine.Date, Pages: csvLine.Pages, AbstractText: csvLine.Abstract })  return count(*);
CREATE INDEX ON :Journal(Abbreviate);
CREATE INDEX ON :Issue(Name);
LOAD CSV WITH HEADERS FROM 'issue.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Issue { Name: csvLine.Name, Volume: csvLine.Volume, Issue: csvLine.Issue, Year: csvLine.Year, Month: csvLine.Month})  return count(*);
LOAD CSV WITH HEADERS FROM 'article.edges' as csvLine FIELDTERMINATOR '\t' MATCH (src:Issue { Name: csvLine.Name }), (tgt:Article { PMID: toInt(csvLine.PMID) }) CREATE (src) -[:hasArticle]-> (tgt) return count(*);

After receiving your files, there was a formatting problem with the TSV. 收到文件后,TSV出现格式问题。

Showing the csvLine at a whole in neo4j showed me null for Issue name, so I modified the format and reexport the file with google drive. 在neo4j中整体显示csvLine时,问题名称为null,因此我修改了格式并使用Google驱动器重新导出了文件。

Also you can check for errors on CsvLINT http://csvlint.io/validation/545681456373761303020000 您也可以在CsvLINT上检查错误http://csvlint.io/validation/545681456373761303020000

LOAD CSV WITH HEADERS FROM 'file:///Users/ikwattro/dev/playbox/pierre/article.edges' as csvLine FIELDTERMINATOR '\t' WITH csvLine LIMIT 10 RETURN csvLine 
I get this
Name PMID   J. Med. Chem., 2014 25358116 
Name PMID  PLoS Comput. Biol., 2014, 10, 10 
Name PMID  J. Med. Chem., 2014 
Name PMID  J. Med. Chem., 2014
Name And PMID Are under the same key

Chris 克里斯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM