AWS Neptune 架构优化 - 数十亿个节点和边缘

Question

I am creating an AWS Neptune graph that will eventually have billions of nodes and edges.我正在创建一个最终将拥有数十亿个节点和边的 AWS Neptune 图。 With this kind of data volume, I was wondering if there are some best practices when creating the schema to optimize for queries.对于这种数据量，我想知道在创建模式以优化查询时是否有一些最佳实践。 One thing in particular that I was curious about is whether there is a major performance difference when querying by property vs. ID:我特别好奇的一件事是，在按属性查询与 ID 查询时是否存在主要的性能差异：

g.V().has('application', 'applicationId', 'application_123')...

vs.对比

g.V('application_123')...

I would assume starting a query with ID in a graph with billions of nodes and edges would be substantially faster.我假设在具有数十亿个节点和边的图中使用 ID 开始查询会快得多。 I was wondering if anyone had any experience with this.我想知道是否有人对此有任何经验。 If this is the case I could give my nodes IDs that I know at query time that way I can always query by ID.如果是这种情况，我可以提供我在查询时知道的节点 ID，这样我就可以随时按 ID 进行查询。 For instance, application nodes would have IDs like application_123 and phone nodes would have IDs like phone_1234567890 where (123) 456 7890 is the phone number.例如，应用程序节点的 ID 类似于application_123 ，电话节点的 ID 类似于phone_1234567890 ，其中 (123) 456 7890 是电话号码。 Would this improve query performance?这会提高查询性能吗？ Anything else I can do to improve query performance on a graph with billions of nodes and edges?我还能做些什么来提高具有数十亿个节点和边的图的查询性能？

Answer 1

In general, when using Amazon Neptune with Gremlin, if you are able to provide your own (meaningful to your domain) IDs for vertices, that will be the most efficient way to look up a specific vertex.通常，在将 Amazon Neptune 与 Gremlin 结合使用时，如果您能够为顶点提供自己的（对您的域有意义）ID，这将是查找特定顶点的最有效方法。 Each vertex ID has to be unique, so as long as you are able to meet that constraint in a meaningful way for your application, that is a sound approach to take.每个顶点 ID 都必须是唯一的，因此只要您能够以对您的应用程序有意义的方式满足该约束，这是一种合理的方法。 Looking up properties is still efficient, as it's backed by an index, but using an ID is the most efficient way to find a vertex or set of vertices.查找属性仍然很有效，因为它由索引支持，但使用 ID 是查找顶点或一组顶点的最有效方法。

It is tricky to give too much generic advice about how to model things as that will, in large part, depend on the access patterns into the data that you need to optimize for and that in turn will inform the choice of data model.给出太多关于如何处理 model 的通用建议是很棘手的，因为这在很大程度上取决于您需要优化的数据的访问模式，而这反过来又会告知数据 model 的选择。

AWS Neptune 架构优化 - 数十亿个节点和边缘

问题描述

1 个解决方案

解决方案1
1 2021-12-11 17:49:44

AWS Neptune 架构优化 - 数十亿个节点和边缘

问题描述

1 个解决方案

解决方案1 1 2021-12-11 17:49:44

解决方案1
1 2021-12-11 17:49:44