用Neo4j圖形數據庫創建圖形花費的時間太長

Question

我使用以下代碼使用Neo4j圖形數據庫創建圖形：

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Map;

import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.index.IndexHits;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.index.lucene.unsafe.batchinsert.LuceneBatchInserterIndexProvider;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserterIndex;
import org.neo4j.unsafe.batchinsert.BatchInserterIndexProvider;
import org.neo4j.unsafe.batchinsert.BatchInserters;


public class Neo4jMassiveInsertion implements Insertion {

    private BatchInserter inserter = null;
    private BatchInserterIndexProvider indexProvider = null;
    private BatchInserterIndex nodes = null;

    private static enum RelTypes implements RelationshipType {
        SIMILAR
    }

    public static void main(String args[]) {
        Neo4jMassiveInsertion test = new Neo4jMassiveInsertion();
        test.startup("data/neo4j");
        test.createGraph("data/enronEdges.txt");
        test.shutdown();
    }

    /**
     * Start neo4j database and configure for massive insertion
     * @param neo4jDBDir
     */
    public void startup(String neo4jDBDir) {
        System.out.println("The Neo4j database is now starting . . . .");
        Map<String, String> config = new HashMap<String, String>();
        inserter = BatchInserters.inserter(neo4jDBDir, config);
        indexProvider = new LuceneBatchInserterIndexProvider(inserter);
        nodes = indexProvider.nodeIndex("nodes", MapUtil.stringMap("type", "exact"));
    }

    public void shutdown() {
        System.out.println("The Neo4j database is now shuting down . . . .");
        if(inserter != null) {
            indexProvider.shutdown();
            inserter.shutdown();
            indexProvider = null;
            inserter = null;
        }
    }

    public void createGraph(String datasetDir) {
        System.out.println("Creating the Neo4j database . . . .");
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(datasetDir)));
            String line;
            int lineCounter = 1;
            Map<String, Object> properties;
            IndexHits<Long> cache;
            long srcNode, dstNode;
            while((line = reader.readLine()) != null) {
                if(lineCounter > 4) {
                    String[] parts = line.split("\t");
                    cache = nodes.get("nodeId", parts[0]);
                    if(cache.hasNext()) {
                        srcNode = cache.next();
                    }
                    else {
                        properties = MapUtil.map("nodeId", parts[0]);
                        srcNode = inserter.createNode(properties);
                        nodes.add(srcNode, properties);
                        nodes.flush();
                    }
                    cache = nodes.get("nodeId", parts[1]);
                    if(cache.hasNext()) {
                        dstNode = cache.next();
                    }
                    else {
                        properties = MapUtil.map("nodeId", parts[1]);
                        dstNode = inserter.createNode(properties);
                        nodes.add(dstNode, properties);
                        nodes.flush();
                    }
                    inserter.createRelationship(srcNode, dstNode, RelTypes.SIMILAR, null);
                }
                lineCounter++;
            }
            reader.close();
        } 
        catch (IOException e) {
            e.printStackTrace();
        }
    }
}

與其他圖形數據庫技術（titan，orientdb）相比，它需要太多時間。 所以我可能做錯了什么。 有沒有辦法提高程序？

我使用neo4j 1.9.5，並且我的計算機具有2.3 GHz CPU（i5），4GB RAM和320GB磁盤，並且我在Macintosh OSX Mavericks（10.9）上運行。 另外我的堆大小為2GB。

Answer 1

通常，我可以在Macbook上每秒導入大約1M個節點和200k個關系。

沖洗和搜索

請不要沖洗並搜索每個插入內容，這完全會降低性能。 將HashMap中的nodeId從數據保留到node-id，並僅在導入期間寫入lucene。

（如果您關心內存使用情況，則還可以使用gnu-trove之類的東西）

內存

內存映射

您還使用了很少的RAM（根據數據集的大小，我通常使用4到60GB之間的堆），並且您沒有任何配置集。

請檢查類似這樣的合理配置，具體取決於您的數據量，我會提高這些數字。

cache_type=none
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1000M
neostore.propertystore.db.mapped_memory=250M
neostore.propertystore.db.strings.mapped_memory=250M

堆

並確保給它足夠的堆。 您可能還擁有可能不是最快的磁盤。 嘗試將您的堆增加到至少3GB。 另外，請確保擁有最新的JDK 1.7 .._ b25出現內存分配問題（它僅為

用Neo4j圖形數據庫創建圖形花費的時間太長

問題描述

1 個解決方案

解決方案1
1 已采納 2014-01-28 11:54:27

沖洗和搜索

內存

內存映射

堆

用Neo4j圖形數據庫創建圖形花費的時間太長

問題描述

1 個解決方案

解決方案1 1 已采納 2014-01-28 11:54:27

沖洗和搜索

內存

內存映射

堆

解決方案1
1 已采納 2014-01-28 11:54:27