泰坦圖數據庫太慢了，帶有索引的100000+個頂點如何優化呢？

Question

這是索引代碼：

`
g = TitanFactory.build().set("storage.backend", "cassandra")
            .set("storage.hostname", "127.0.0.1").open();

    TitanManagement mgmt = g.getManagementSystem();

    PropertyKey db_local_name = mgmt.makePropertyKey("db_local_name")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_local_name", Vertex.class).addKey(db_local_name)
            .buildCompositeIndex();

    PropertyKey db_schema = mgmt.makePropertyKey("db_schema")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_schema", Vertex.class).addKey(db_schema)
            .buildCompositeIndex();

    PropertyKey db_column = mgmt.makePropertyKey("db_column")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_column", Vertex.class).addKey(db_column)
            .buildCompositeIndex();

    PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class)
            .make();
    mgmt.buildIndex("byType", Vertex.class).addKey(type)
            .buildCompositeIndex();

    PropertyKey value = mgmt.makePropertyKey("value")
            .dataType(Object.class).make();
    mgmt.buildIndex("byValue", Vertex.class).addKey(value)
            .buildCompositeIndex();

    PropertyKey index = mgmt.makePropertyKey("index")
            .dataType(Integer.class).make();
    mgmt.buildIndex("byIndex", Vertex.class).addKey(index)
            .buildCompositeIndex();

    mgmt.commit();`

這是搜索頂點，然后在3GHz 2GB RAM pc上添加具有3條邊的頂點。 它在3小時內完成830個頂點，而我有100,000個數據，它的速度太慢。 代碼如下：

for (Object[] rowObj : list) {
            // TXN_ID
            Iterator<Vertex> iter = g.query()
                    .has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "txn_id")
                    .has("value", rowObj[0]).vertices().iterator();
            if (iter.hasNext()) {
                vertex1 = iter.next();
                logger.debug("vertex1=" + vertex1.getId() + ","
                        + vertex1.getProperty("db_local_name") + ","
                        + vertex1.getProperty("db_schema") + ","
                        + vertex1.getProperty("db_column") + ","
                        + vertex1.getProperty("type") + ","
                        + vertex1.getProperty("index") + ","
                        + vertex1.getProperty("value"));
            }
            // TXN_TYPE
            iter = g.query().has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "txn_type")
                    .has("value", rowObj[1]).vertices().iterator();
            if (iter.hasNext()) {
                vertex2 = iter.next();
                logger.debug("vertex2=" + vertex2.getId() + ","
                        + vertex2.getProperty("db_local_name") + ","
                        + vertex2.getProperty("db_schema") + ","
                        + vertex2.getProperty("db_column") + ","
                        + vertex2.getProperty("type") + ","
                        + vertex2.getProperty("index") + ","
                        + vertex2.getProperty("value"));
            }
            // WALLET_ID
            iter = g.query().has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "wallet_id")
                    .has("value", rowObj[2]).vertices().iterator();
            if (iter.hasNext()) {
                vertex3 = iter.next();
                logger.debug("vertex3=" + vertex3.getId() + ","
                        + vertex3.getProperty("db_local_name") + ","
                        + vertex3.getProperty("db_schema") + ","
                        + vertex3.getProperty("db_column") + ","
                        + vertex3.getProperty("type") + ","
                        + vertex3.getProperty("index") + ","
                        + vertex3.getProperty("value"));
            }

            vertex4 = g.addVertex(null);
            vertex4.setProperty("db_local_name", "Report Name 1");
            vertex4.setProperty("db_schema", "MPS");
            vertex4.setProperty("db_column", "amount");
            vertex4.setProperty("type", "indivisual_0");
            vertex4.setProperty("value", rowObj[3].toString());
            vertex4.setProperty("index", i);

            vertex1.addEdge("data", vertex4);
            logger.debug("vertex1 added");
            vertex2.addEdge("data", vertex4);
            logger.debug("vertex2 added");
            vertex3.addEdge("data", vertex4);
            logger.debug("vertex3 added");
            i++;
            g.commit();
        }

無論如何，有沒有優化此代碼？

Answer 1

為了完整起見，在Aurelius Graphs郵件列表中回答了這個問題：

https://groups.google.com/forum/#!topic/aureliusgraphs/XKT6aokRfFI

基本上：

構建/使用真實的復合索引： mgmt.buildIndex("by_local_name_schema_value", Vertex.class).addKey(db_local_name).addKey(db_schema).addKey(value).buildComposite();
不要在每個循環周期后調用g.commit() ，而是執行以下操作： if (++1%10000 == 0) g.commit()
如果尚未打開storage.batch-loading打開它
如果您可以在cassandra上使用的只是2G RAM，請考慮使用BerkleyDB。 Cassandra希望最低RAM為4G ，並且可能希望“更多”
我不知道您的數據的性質，但是您可以按照“十個力量-第I部分”博客文章和Wiki中的描述對數據進行預排序並使用BatchGraph使用BatchGraph可以防止您維護所描述的交易在上面的數字2中。

泰坦圖數據庫太慢了，帶有索引的100000+個頂點如何優化呢？

問題描述

1 個解決方案

解決方案1
1 已采納 2014-09-18 10:56:13

泰坦圖數據庫太慢了，帶有索引的100000+個頂點如何優化呢？

問題描述

1 個解決方案

解決方案1 1 已采納 2014-09-18 10:56:13

解決方案1
1 已采納 2014-09-18 10:56:13