简体   繁体   中英

Importing bulk json data into neo4j

I am trying to load json file of size about 700k. But it is showing me the heap memory out of space error.

My query is as below:

WITH "file:///Users//arundhathi.d//Documents//Neo4j//default.graphdb//import//tjson.json" as url  
call apoc.load.json(url) yield value as article return article

Like in csv I tried to use USING PERIODIC COMMIT 1000 with json. But I am not allowed to use with loading json.

How to load bulk json data?.

apoc.load.json now supports a json-path as a second parameter.

To get the first 1000 JSON objects from the array in the file, try this:

WITH "file:///path_to_file.json" as url  
CALL apoc.load.json(url, '[0:1000]') YIELD value AS article
RETURN article;

The [0:1000] syntax specifies a range of array indices, and the second number is exclusive (so, in this example, the last index in the range is 999).

The above should at least work in neo4j 3.1.3 (with apoc release 3.1.3.6 ). Note also that the Desktop versions of neo4j (installed via the Windows and OSX installers) have a new requirement concerning where to put plugins like apoc in order to import local files.

You can also convert JSON into CSV files using jq - a uber fast json converter. https://stedolan.github.io/jq/tutorial/

This is the recommended way according to: https://neo4j.com/blog/bulk-data-import-neo4j-3-0/

If you have many files, write a python program or similar that iterates through the length of files calling:

os.system("cat file{}.json | jq '. [.entity1, .entity2, .entity3] | @csv' >> concatenatedCSV.csv".format(num))

or in Go:

exec.Command("cat file"+num+".json | jq '. [.entity1, .entity2, .entity3] | @csv' >> concatenatedCSV.csv")

I recently did this for about 700GB of JSON files. It takes some thought to get the csv files in the right format, but if you follow the tutorial on jq you'll pickup how to do it. Additionally, check out how the headers need to be and what not here: https://neo4j.com/docs/operations-manual/current/tools/import/

It took about a day to convert it all, but given the transaction overhead of using apoc, and the ability to reimport at anytime once the files are in the format it is worth it in the long run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM