Titan數據庫：在Java代碼中迭代數千個頂點的性能問題

Question

我將Cassandra后端存儲與Titan數據庫（版本1.0.0）一起使用。 我的數據庫很大（數百萬個頂點和邊）。 我正在使用Elasticsearch進行索引。 它做得很好，相對來說我很容易並且很快就收到了成千上萬（〜40000）個頂點作為查詢的答案。 但是我有性能問題，然后嘗試遍歷頂點，並檢索保存在頂點屬性上的基本數據。 我大約要花1分鍾！！！

Java 8並行流的使用顯着提高了性能，但還不夠（10秒而不是1分鍾）。

考慮到我有數千個具有位置屬性和時間戳的頂點。 我只想檢索在查詢區域內具有位置（幾何形狀）的頂點，並收集不同的時間戳。

這是使用Java 8並行流的Java代碼的一部分：

TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Spliterator<TitanVertex> locationsSpl = graphQuery.vertices().spliterator();

Set<String> locationTimestamps = StreamSupport.stream(locationsSpl, true)
        .map(locVertex -> {//map location vertices to timestamp String
            String timestamp = locVertex.valueOrNull(timestampKey);

            //this iteration takes about 10 sec to iterate over 40000 vertices
            return timestamp;
         })
         .distinct()
         .collect(Collectors.toSet());

使用標准Java迭代的相同代碼：

TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Set<String> locationTimestamps = new HashSet<>();
for(TitanVertex locVertex : (Iterable<TitanVertex>) graphQuery.vertices()) {
    String timestamp = locVertex.valueOrNull(timestampKey);
    locationTimestamps.add(timestamp);        
    //this iteration takes about 45 sec to iterate over 40000 vertices            
}

這種表現讓我很失望。 更糟糕的是，結果將是大約一百萬個頂點。 我嘗試了解此問題的原因。 我希望這將使我花更少的1秒鍾時間遍歷各個頂點。

Answer 1

相同的查詢，但是使用gremlin遍歷而不是圖形查詢具有更好的性能和更短的代碼：

TitanTransaction tt = graph.newTransaction();
Set<String> locationTimestamps = tt.traversal().V().has(LOCATION, P.within(cLocation))
    .dedup(TIME_STAMP)
    .values(TIME_STAMP)
    .toSet();

Titan數據庫：在Java代碼中迭代數千個頂點的性能問題

問題描述

1 個解決方案

解決方案1
0 已采納 2017-01-09 12:57:04

Titan數據庫：在Java代碼中迭代數千個頂點的性能問題

問題描述

1 個解決方案

解決方案1 0 已采納 2017-01-09 12:57:04

解決方案1
0 已采納 2017-01-09 12:57:04