[英]Is there a better way to model this access pattern than to use two global secondary indexes (GSI)?
I'm trying to figure out the data model and access patterns for an app keeping track of animal movements between different fields (pastures).我试图找出数据 model 和应用程序的访问模式,以跟踪不同领域(牧场)之间的动物运动。 There are movement records that look like this:
有如下运动记录:
PK FROM TO DATE
------------------------------------------------------
ANIMAL#001 FIELD#A FIELD#B January 3
ANIMAL#001 FIELD#Q FIELD#R September 19
ANIMAL#002 FIELD#A FIELD#B January 3
ANIMAL#003 FIELD#C FIELD#D March 15
ANIMAL#005 FIELD#F FIELD#A April 22
For a specific field, eg FIELD#A
, I'd like to know all the movements into and out of that field, the date of the movement, and the number of animals.对于特定领域,例如
FIELD#A
,我想知道进出该领域的所有运动、运动的日期和动物的数量。 The results should look like:结果应如下所示:
DATE FROM TO NUMBER_ANIMALS
--------------------------------------------------
January 3 FIELD#A FIELD#B 2
April 22 FIELD#F FIELD#A 1
Possible solutions and attempts:可能的解决方案和尝试:
A GSI with PK=FROM, SK=TO
. PK=FROM, SK=TO
的 GSI。 If I query the GSI with PK=FIELD#A
, this only gives one half of the picture, that is, movements from FIELD#A
.如果我用
PK=FIELD#A
查询 GSI,这只给出图片的一半,即来自FIELD#A
运动。 I can't obtain movements to FIELD#A
.我无法获得
FIELD#A
动作。
A composite attribute like FIELD#A#FIELD#B
used as the PK
in a GSI.像
FIELD#A#FIELD#B
这样的复合属性在 GSI 中用作PK
。 Runs into the same problem as attempt 1.遇到与尝试 1 相同的问题。
Two GSI.两个 GSI。 GSI1 has
PK=FROM
and GSI2 has PK=TO
. GSI1 有
PK=FROM
并且 GSI2 有PK=TO
。 I can query GSI1 with PK=FIELD#A
and do some post-processing ( groupby, count
) to get part of the result.我可以使用
PK=FIELD#A
查询 GSI1 并进行一些后处理( groupby, count
)以获得部分结果。 I can then query GSI2 with PK=FIELD#A
and post-process, getting the rest of the result.然后我可以使用
PK=FIELD#A
和后处理查询 GSI2,得到结果的 rest。 This looks like it will work but requires two GSI and two queries.这看起来可行,但需要两个 GSI 和两个查询。 I can't overload one GSI since both columns in use are from the same item.
我不能重载一个 GSI,因为正在使用的两列都来自同一个项目。
Some combination of scanning the entire table and filtering the results which I'd rather avoid since there might be 50,000+ items in the entire table.我宁愿避免扫描整个表格并过滤结果的某种组合,因为整个表格中可能有 50,000 多个项目。
I can see how to do it with two GSIs, but what's the most efficient way?我可以看到如何使用两个 GSI 来做到这一点,但最有效的方法是什么?
I could imagine a slightly different table structure ( ANIMALID
being the partition key and FIELDID
being the sort key):我可以想象一个稍微不同的表结构(
ANIMALID
是分区键, FIELDID
是排序键):
ANIMALID | FIELDID | FROM_TO | ...
——————————————————————————————————
ANIMAL#1 | FIELD#A | FROM | ...
ANIMAL#1 | FIELD#B | TO | ...
ANIMAL#2 | FIELD#C | FROM | ...
ANIMAL#2 | FIELD#A | TO | ...
And a GSI with the following structure:以及具有以下结构的 GSI:
FIELDID | ANIMALID | ...
Then you can query the GSI just by FIELDID
and aggregate the results.然后您可以仅通过
FIELDID
查询 GSI 并汇总结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.