对于 model 这种访问模式，有没有比使用两个全局二级索引 (GSI) 更好的方法？

Question

I'm trying to figure out the data model and access patterns for an app keeping track of animal movements between different fields (pastures).我试图找出数据 model 和应用程序的访问模式，以跟踪不同领域（牧场）之间的动物运动。 There are movement records that look like this:有如下运动记录：

PK                 FROM            TO          DATE
------------------------------------------------------
ANIMAL#001       FIELD#A       FIELD#B       January 3
ANIMAL#001       FIELD#Q       FIELD#R       September 19
ANIMAL#002       FIELD#A       FIELD#B       January 3
ANIMAL#003       FIELD#C       FIELD#D       March 15
ANIMAL#005       FIELD#F       FIELD#A       April 22

For a specific field, eg FIELD#A , I'd like to know all the movements into and out of that field, the date of the movement, and the number of animals.对于特定领域，例如FIELD#A ，我想知道进出该领域的所有运动、运动的日期和动物的数量。 The results should look like:结果应如下所示：

DATE        FROM        TO          NUMBER_ANIMALS
--------------------------------------------------
January 3   FIELD#A     FIELD#B         2
April 22    FIELD#F     FIELD#A         1

Possible solutions and attempts:可能的解决方案和尝试：

A GSI with PK=FROM, SK=TO . PK=FROM, SK=TO的 GSI。 If I query the GSI with PK=FIELD#A , this only gives one half of the picture, that is, movements from FIELD#A .如果我用PK=FIELD#A查询 GSI，这只给出图片的一半，即来自FIELD#A运动。 I can't obtain movements to FIELD#A .我无法获得FIELD#A动作。
A composite attribute like FIELD#A#FIELD#B used as the PK in a GSI.像FIELD#A#FIELD#B这样的复合属性在 GSI 中用作PK 。 Runs into the same problem as attempt 1.遇到与尝试 1 相同的问题。
Two GSI.两个 GSI。 GSI1 has PK=FROM and GSI2 has PK=TO . GSI1 有PK=FROM并且 GSI2 有PK=TO 。 I can query GSI1 with PK=FIELD#A and do some post-processing ( groupby, count ) to get part of the result.我可以使用PK=FIELD#A查询 GSI1 并进行一些后处理（ groupby, count ）以获得部分结果。 I can then query GSI2 with PK=FIELD#A and post-process, getting the rest of the result.然后我可以使用PK=FIELD#A和后处理查询 GSI2，得到结果的 rest。 This looks like it will work but requires two GSI and two queries.这看起来可行，但需要两个 GSI 和两个查询。 I can't overload one GSI since both columns in use are from the same item.我不能重载一个 GSI，因为正在使用的两列都来自同一个项目。
Some combination of scanning the entire table and filtering the results which I'd rather avoid since there might be 50,000+ items in the entire table.我宁愿避免扫描整个表格并过滤结果的某种组合，因为整个表格中可能有 50,000 多个项目。

I can see how to do it with two GSIs, but what's the most efficient way?我可以看到如何使用两个 GSI 来做到这一点，但最有效的方法是什么？

Answer 1

I could imagine a slightly different table structure ( ANIMALID being the partition key and FIELDID being the sort key):我可以想象一个稍微不同的表结构（ ANIMALID是分区键， FIELDID是排序键）：

ANIMALID | FIELDID | FROM_TO | ...
——————————————————————————————————
ANIMAL#1 | FIELD#A | FROM    | ...
ANIMAL#1 | FIELD#B | TO      | ...
ANIMAL#2 | FIELD#C | FROM    | ...
ANIMAL#2 | FIELD#A | TO      | ...

And a GSI with the following structure:以及具有以下结构的 GSI：

FIELDID | ANIMALID | ...

Then you can query the GSI just by FIELDID and aggregate the results.然后您可以仅通过FIELDID查询 GSI 并汇总结果。

对于 model 这种访问模式，有没有比使用两个全局二级索引 (GSI) 更好的方法？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-06 21:36:01

对于 model 这种访问模式，有没有比使用两个全局二级索引 (GSI) 更好的方法？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-06 21:36:01

解决方案1
1 已采纳 2020-07-06 21:36:01