简体   繁体   English

ArangoDB 性能

[英]ArangoDB Performance

I am exploring the use of Arangodb as a graph engine for a project I am working on that needs shortest path analysis.我正在探索将 Arangodb 用作我正在从事的需要最短路径分析的项目的图形引擎。

My collections look like this:我的 collections 看起来像这样:

  • a route network of ~3.5M edges in an edge collection (_to/_from)边缘集合中约 3.5M 边缘的路由网络 (_to/_from)
  • a vertex collection ~2.7M vertices (geo index on [lat,lng]).一个顶点集合 ~270 万个顶点([lat,lng] 上的地理索引)。
  • a trips collection with start/end locations (not mapped to nodes).带有开始/结束位置(未映射到节点)的trips 集合。

The first task is to snap the origin and destination coordinates of the trips to vertices in on the network.第一项任务是将旅行的起点和终点坐标捕捉到网络上的顶点。 I am using the following query to do that:我正在使用以下查询来做到这一点:

FOR t IN trips
    let snappedFrom = (
        FOR x IN nodes
          SORT GEO_DISTANCE([t.Orig_Long, t.Orig_Lat], [x.lng, x.lat]) ASC
          LIMIT 1
          RETURN x._id
        )[0]
    let snappedTo = (
        FOR x IN nodes
          SORT GEO_DISTANCE([t.Dest_Long, t.Dest_Lat], [x.lng, x.lat]) ASC
          LIMIT 1
          RETURN x._id
        )[0]
    UPDATE t._key WITH {snappedFrom,snappedTo} IN trips

This is taking around 3.5 hours, and I want to reduce that significantly if possible.这大约需要 3.5 小时,如果可能的话,我想显着减少它。

I am running on an AWS instance with 32GB of RAM and 8 cores.我在具有 32GB RAM 和 8 个内核的 AWS 实例上运行。 I notice that when running this query, it is only using a single core which is killing me.我注意到在运行这个查询时,它只使用一个核心,这让我很生气。

I am curious about setting up the arangodb for pure performance.我很好奇设置 arangodb 以获得纯粹的性能。 My use case is using the DB as a calculator really.我的用例实际上是使用数据库作为计算器。 In fact is likely it will be part of a CI/CD workflow when done.事实上,完成后它很可能会成为 CI/CD 工作流程的一部分。 I don't need any safe guards in there, there wont be any parallel user requests, and if the data is bad, I just blow it away and start again.我在那里不需要任何安全防护,不会有任何并行的用户请求,如果数据不好,我就把它吹走,然后重新开始。

I am using a standard install with docker我正在使用 docker 的标准安装

docker run -it --name=adb --rm -p 8528:8528 -v arangodb:/data -d -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter --starter.address=<$IP> --starter.mode=single

I am going to run into the same issue when I run shortest_path on all trips too, that will take forever if single core.当我在所有行程中运行shortest_path时,我也会遇到同样的问题,如果是单核,这将永远存在。

Any help with the config, better query, or even better AWS setups would be truly appreciated.任何有关配置、更好的查询甚至更好的 AWS 设置的帮助都将不胜感激。

add Geo-Spatial Indexes on Orig and Dest fields, that will enable server to optimize / speed up sub-queriesOrigDest字段上添加地理空间索引,这将使服务器能够优化/加速子查询

for further speeding up of processing run main query in batches, processing more smaller batches is faster than running over all documents at once为了进一步加快批量运行主查询的处理速度,处理更多的小批量比一次运行所有文档要快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM