ArangoDB 性能

Question

I am exploring the use of Arangodb as a graph engine for a project I am working on that needs shortest path analysis.我正在探索将 Arangodb 用作我正在从事的需要最短路径分析的项目的图形引擎。

My collections look like this:我的 collections 看起来像这样：

a route network of ~3.5M edges in an edge collection (_to/_from)边缘集合中约 3.5M 边缘的路由网络 (_to/_from)
a vertex collection ~2.7M vertices (geo index on [lat,lng]).一个顶点集合 ~270 万个顶点（[lat,lng] 上的地理索引）。
a trips collection with start/end locations (not mapped to nodes).带有开始/结束位置（未映射到节点）的trips 集合。

The first task is to snap the origin and destination coordinates of the trips to vertices in on the network.第一项任务是将旅行的起点和终点坐标捕捉到网络上的顶点。 I am using the following query to do that:我正在使用以下查询来做到这一点：

FOR t IN trips
    let snappedFrom = (
        FOR x IN nodes
          SORT GEO_DISTANCE([t.Orig_Long, t.Orig_Lat], [x.lng, x.lat]) ASC
          LIMIT 1
          RETURN x._id
        )[0]
    let snappedTo = (
        FOR x IN nodes
          SORT GEO_DISTANCE([t.Dest_Long, t.Dest_Lat], [x.lng, x.lat]) ASC
          LIMIT 1
          RETURN x._id
        )[0]
    UPDATE t._key WITH {snappedFrom,snappedTo} IN trips

This is taking around 3.5 hours, and I want to reduce that significantly if possible.这大约需要 3.5 小时，如果可能的话，我想显着减少它。

I am running on an AWS instance with 32GB of RAM and 8 cores.我在具有 32GB RAM 和 8 个内核的 AWS 实例上运行。 I notice that when running this query, it is only using a single core which is killing me.我注意到在运行这个查询时，它只使用一个核心，这让我很生气。

I am curious about setting up the arangodb for pure performance.我很好奇设置 arangodb 以获得纯粹的性能。 My use case is using the DB as a calculator really.我的用例实际上是使用数据库作为计算器。 In fact is likely it will be part of a CI/CD workflow when done.事实上，完成后它很可能会成为 CI/CD 工作流程的一部分。 I don't need any safe guards in there, there wont be any parallel user requests, and if the data is bad, I just blow it away and start again.我在那里不需要任何安全防护，不会有任何并行的用户请求，如果数据不好，我就把它吹走，然后重新开始。

I am using a standard install with docker我正在使用 docker 的标准安装

docker run -it --name=adb --rm -p 8528:8528 -v arangodb:/data -d -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter --starter.address=<$IP> --starter.mode=single

I am going to run into the same issue when I run shortest_path on all trips too, that will take forever if single core.当我在所有行程中运行shortest_path时，我也会遇到同样的问题，如果是单核，这将永远存在。

Any help with the config, better query, or even better AWS setups would be truly appreciated.任何有关配置、更好的查询甚至更好的 AWS 设置的帮助都将不胜感激。

Answer 1

add Geo-Spatial Indexes on Orig and Dest fields, that will enable server to optimize / speed up sub-queries在Orig和Dest字段上添加地理空间索引，这将使服务器能够优化/加速子查询

for further speeding up of processing run main query in batches, processing more smaller batches is faster than running over all documents at once为了进一步加快批量运行主查询的处理速度，处理更多的小批量比一次运行所有文档要快

ArangoDB 性能

问题描述

1 个解决方案

解决方案1
0 2020-07-31 05:28:49

ArangoDB 性能

问题描述

1 个解决方案

解决方案1 0 2020-07-31 05:28:49

解决方案1
0 2020-07-31 05:28:49