简体   繁体   English

在 Neo4J 中加载大型密码文件

[英]Loading large cypher file in Neo4J

I'm having some difficulty loading a Cypher file into Neo4J in Windows 10. The file in question is a 175 Mb .cql file filled with more than a million lines of nodes and edges (separated by semicolons) in the Cypher language -- CREATE [node], that sort of thing.我在 Windows 10 中将 Cypher 文件加载到 Neo4J 时遇到了一些困难。有问题的文件是一个 175 Mb 的 .cql 文件,其中填充了超过一百万行 Cypher 语言中的节点和边(用分号分隔)——CREATE [节点],那种东西。 For smaller items, I have been using an APOC command in the web browser:对于较小的项目,我一直在网络浏览器中使用 APOC 命令:

call apoc.cypher.runFile('file:///<file path>')

but this is too slow for a million+ query file.但这对于一百万以上的查询文件来说太慢了。 I've created indexes for the nodes, and am currently running it through a command:我已经为节点创建了索引,目前正在通过命令运行它:

neo4j-shell -file <file path> -path localhost

but this is still slow.但这仍然很慢。 I was wondering, is there any way to speed up the intake?我想知道,有什么办法可以加快摄入量吗?

Also, note that I am using an recent ONGDB build, rather than straight Neo4J;另外,请注意,我使用的是最近的 ONGDB 构建,而不是直接的 Neo4J; I do not believe this will make any substantial difference.我不相信这会产生任何实质性的不同。

If the purpose of your very large CQL file is simply to ingest data, then doing it purely in Cypher is going to be very slow (and may even cause an out-of-memory error).如果您的超大 CQL 文件的目的只是为了摄取数据,那么纯粹在 Cypher 中执行此操作将非常缓慢(甚至可能导致内存不足错误)。

If you are ingesting into a new neo4j DB, you should consider refactoring the data out of it and using the import command of neo4j-admin tool to efficiently ingest the data.如果您正在摄取新的 neo4j 数据库,您应该考虑从中重构数据并使用neo4j-admin工具的 导入命令来有效摄取数据。

If you are ingesting into an existing DB, you should consider refactoring the data and logic out of the CQL file and using LOAD CSV .如果您要摄取到现有数据库中,则应考虑从 CQL 文件中重构数据和逻辑并使用LOAD CSV

I ended up ingesting it using cypher-shell.我最终使用 cypher-shell 摄取了它。 It's still slow, but at least it does finish.它仍然很慢,但至少它完成了。 Using it requires one to first open a Neo4J console then, in a second command line, use:使用它需要先打开一个 Neo4J 控制台,然后在第二个命令行中使用:

type <filepath>\data.cql | bin\cypher-shell.bat -a localhost -u <user> -p <password> --fail-at-end

This works for Windows 10, although it does take a while.这适用于 Windows 10,虽然它需要一段时间。

When running a query outside of a transaction, neo4j will automatically start and commit a separate transaction for every query .在事务之外运行查询时,neo4j 会自动启动并为每个查询提交一个单独的事务 You can speed things up by starting a transaction at the beginning, and committing and starting a new transactions every few thousand queries (memory use will go up with transaction size, so that's the limiting factor on how large the transactions can be).您可以通过在开始时启动一个事务并每几千次查询提交和启动一个新事务来加快处理速度(内存使用量会随着事务大小而增加,因此这是事务大小的限制因素)。

Example queries.cypher (with transactions of size 3):示例querys.cypher(事务大小为3):

:begin
CREATE(n:PERSON { name: "Homer Simpson" })  
CREATE(n:PERSON { name: "Marge Simpson" })
CREATE(n:PERSON { name: "Abe Simpson" })    
:commit
:begin
CREATE(n:PERSON { name: "Bart Simpson" })
CREATE(n:PERSON { name: "Lisa Simpson" })
CREATE(n:PERSON { name: "Maggie Simpson" })
:commit

And then run cypher-shell < queries.cypher as usual.然后像往常一样运行cypher-shell < queries.cypher

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM