[英]Neo4j Insertion taking more time
我們有大約50,000個節點和80,000,000(800,000,000,000萬)的邊緣。
我們正在嘗試使用Java將此數據插入neo4j(嵌入式圖數據庫)。 但這要花費很多時間(數小時)。
我們想知道插入任何地方是否出錯。 我們正在為節點使用自動索引。 完整的實現在下面給出。
請讓我知道發生了什么問題以及與以下代碼有關的更改。
public static void main(String[] args)
{
// TODO Auto-generated method stub
nodeGraph obj = new nodeGraph();
obj.createDB();
System.out.println("Graph Database Initialised");
obj.parseNodesCsv();
System.out.println("Creating relationships in process....");
obj.parseEdgesCsv();
obj.shutDown();
}
public void createDB() {
graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder( DB_PATH ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "id,name" ).
setConfig( GraphDatabaseSettings.relationship_keys_indexable, "rel" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.relationship_auto_indexing, "true" ).
newGraphDatabase();
registerShutdownHook(graphDb);
// Get the Node AutoIndexer, set nodeProp1 and nodeProp2 as auto
// indexed.
AutoIndexer<Node> nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
nodeAutoIndexer.startAutoIndexingProperty( "id" );
nodeAutoIndexer.startAutoIndexingProperty( "name" );
// Get the Relationship AutoIndexer
//AutoIndexer<Relationship> relAutoIndexer = graphDb.index().getRelationshipAutoIndexer();
//relAutoIndexer.startAutoIndexingProperty( "relProp1" );
// None of the AutoIndexers are enabled so far. Do that now
nodeAutoIndexer.setEnabled( true );
//relAutoIndexer.setEnabled( true );
}
public void parseNodesCsv(){
try
{
CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/nodesNeo.csv"),' ','"');
String rows[]=null;
while ((rows=reader.readNext())!=null)
{
createNode(rows);
System.out.println(rows[0]);
}
reader.close();
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
System.err.println("Error: cannot find datasource.");
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void parseEdgesCsv(){
try
{
CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/edgesNeo.csv"),',','"');
String rows[]=null;
while ((rows=reader.readNext())!=null)
{
createRelationshipsUsingIndexes(rows);
}
reader.close();
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
System.err.println("Error: cannot find datasource.");
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void createNode(String[] rows){
Transaction tx = graphDb.beginTx();
try
{
firstNode = graphDb.createNode(DynamicLabel.label( rows[2] ));
firstNode.setProperty("id",rows[0] );
firstNode.setProperty("name",rows[1] );
System.out.println(firstNode.getProperty("id"));
tx.success();
}
finally
{
tx.finish();
}
}
public void createRelationshipsUsingIndexes(String rows[]){
Transaction tx = graphDb.beginTx();
try
{
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
// node1 and node2 both had auto indexed properties, get them
firstNode=autoNodeIndex.get( "id", rows[0] ).getSingle();
secondNode=autoNodeIndex.get( "id", rows[1] ).getSingle();
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CO_OCCURRED );
relationship.setProperty( "frequency", rows[2] );
relationship.setProperty( "generatability_score", rows[3] );
tx.success();
}
finally
{
tx.finish();
}
}
您用於導入的內存配置(堆)是什么? 您正在運行什么操作系統(假設使用Linux),以及正在使用什么Neo4j版本?
我建議升級到Neo4j 2.0.3的最新穩定版本
導入存在一些問題:
在FileReader周圍使用BufferedReader,以獲得更好的CSV讀取性能。
使用我的批處理導入器進行快速初始導入會更有意義
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.