简体   繁体   中英

Loading data in Neo4j from csv

I'm uploading iteratively nodes and edges from several csv files, one per node type. Loading nodes works well, but loading edges does not always work. Indeed, some of the nodes have numbers as identifiers - if so, all nodes of that type have numbers - but are loaded as strings, hence creating edges fail. Creating manually an edge adding wrapping the identifier with double quotes works well.

How can I either force LOAD CSV to use numbers for these identifiers while creating nodes, or force LOAD CSV to wrap identifiers with double quotes?

Article nodes:

Type    PMID    ArticleTitle    AbstractText    Date    Pages
Article 25358116    Synthesis of... Abstract    2014-10-30  
Article 25358093    Putting theory...   In this study...    2014-10-30  e1003910

Issue nodes:

Type    Name    Year    Month   Volume  Issue
Issue   J. Med. Chem., 2014 2014    Oct     
Issue   PLoS Comput. Biol., 2014, 10, 10    2014    Oct 10  10
Issue   PLoS ONE, 2014, 9, 10   2014        9   10

Edges:

Name    PMID
J. Med. Chem., 2014 25358116
PLoS Comput. Biol., 2014, 10, 10    25358093

Cypher commands:

CREATE INDEX ON :Article(PMID);
LOAD CSV WITH HEADERS FROM 'article.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Article { PMID: toInt(csvLine.PMID), Title: csvLine.ArticleTitle, Date: csvLine.Date, Pages: csvLine.Pages, AbstractText: csvLine.Abstract })  return count(*);
CREATE INDEX ON :Journal(Abbreviate);
CREATE INDEX ON :Issue(Name);
LOAD CSV WITH HEADERS FROM 'issue.nodes' as csvLine FIELDTERMINATOR '\t' CREATE (:Issue { Name: csvLine.Name, Volume: csvLine.Volume, Issue: csvLine.Issue, Year: csvLine.Year, Month: csvLine.Month})  return count(*);
LOAD CSV WITH HEADERS FROM 'article.edges' as csvLine FIELDTERMINATOR '\t' MATCH (src:Issue { Name: csvLine.Name }), (tgt:Article { PMID: toInt(csvLine.PMID) }) CREATE (src) -[:hasArticle]-> (tgt) return count(*);

After receiving your files, there was a formatting problem with the TSV.

Showing the csvLine at a whole in neo4j showed me null for Issue name, so I modified the format and reexport the file with google drive.

Also you can check for errors on CsvLINT http://csvlint.io/validation/545681456373761303020000

LOAD CSV WITH HEADERS FROM 'file:///Users/ikwattro/dev/playbox/pierre/article.edges' as csvLine FIELDTERMINATOR '\t' WITH csvLine LIMIT 10 RETURN csvLine 
I get this
Name PMID   J. Med. Chem., 2014 25358116 
Name PMID  PLoS Comput. Biol., 2014, 10, 10 
Name PMID  J. Med. Chem., 2014 
Name PMID  J. Med. Chem., 2014
Name And PMID Are under the same key

Chris

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM