[英]ArangoDB Performance
I am exploring the use of Arangodb as a graph engine for a project I am working on that needs shortest path analysis.我正在探索将 Arangodb 用作我正在从事的需要最短路径分析的项目的图形引擎。
My collections look like this:我的 collections 看起来像这样:
The first task is to snap the origin and destination coordinates of the trips to vertices in on the network.第一项任务是将旅行的起点和终点坐标捕捉到网络上的顶点。 I am using the following query to do that:我正在使用以下查询来做到这一点:
FOR t IN trips
let snappedFrom = (
FOR x IN nodes
SORT GEO_DISTANCE([t.Orig_Long, t.Orig_Lat], [x.lng, x.lat]) ASC
LIMIT 1
RETURN x._id
)[0]
let snappedTo = (
FOR x IN nodes
SORT GEO_DISTANCE([t.Dest_Long, t.Dest_Lat], [x.lng, x.lat]) ASC
LIMIT 1
RETURN x._id
)[0]
UPDATE t._key WITH {snappedFrom,snappedTo} IN trips
This is taking around 3.5 hours, and I want to reduce that significantly if possible.这大约需要 3.5 小时,如果可能的话,我想显着减少它。
I am running on an AWS instance with 32GB of RAM and 8 cores.我在具有 32GB RAM 和 8 个内核的 AWS 实例上运行。 I notice that when running this query, it is only using a single core which is killing me.我注意到在运行这个查询时,它只使用一个核心,这让我很生气。
I am curious about setting up the arangodb for pure performance.我很好奇设置 arangodb 以获得纯粹的性能。 My use case is using the DB as a calculator really.我的用例实际上是使用数据库作为计算器。 In fact is likely it will be part of a CI/CD workflow when done.事实上,完成后它很可能会成为 CI/CD 工作流程的一部分。 I don't need any safe guards in there, there wont be any parallel user requests, and if the data is bad, I just blow it away and start again.我在那里不需要任何安全防护,不会有任何并行的用户请求,如果数据不好,我就把它吹走,然后重新开始。
I am using a standard install with docker我正在使用 docker 的标准安装
docker run -it --name=adb --rm -p 8528:8528 -v arangodb:/data -d -v /var/run/docker.sock:/var/run/docker.sock arangodb/arangodb-starter --starter.address=<$IP> --starter.mode=single
I am going to run into the same issue when I run shortest_path
on all trips too, that will take forever if single core.当我在所有行程中运行shortest_path
时,我也会遇到同样的问题,如果是单核,这将永远存在。
Any help with the config, better query, or even better AWS setups would be truly appreciated.任何有关配置、更好的查询甚至更好的 AWS 设置的帮助都将不胜感激。
add Geo-Spatial Indexes on Orig
and Dest
fields, that will enable server to optimize / speed up sub-queries在Orig
和Dest
字段上添加地理空间索引,这将使服务器能够优化/加速子查询
for further speeding up of processing run main query in batches, processing more smaller batches is faster than running over all documents at once为了进一步加快批量运行主查询的处理速度,处理更多的小批量比一次运行所有文档要快
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.