简体   繁体   中英

Fetching parent and child item in single query in DynamoDB

I have the following one-to-many relationship:

Account 1--* User

The Account contains global account-level information, which is mutable.

The User contains user-level information, which is also mutable.

When the user signs-in, they need both Account and User information. (I only know the UserId at this point).

I ideally want to design the schema such that a single query is necessary. However, I cannot determine how to do this without duplicating the Account into each User and thus requiring some background Lambda job to propagate changes to Account attributes across all User objects -- which, for the record, seems like more resource usage (and code to maintain) than simply normalizing the data and having 2 queries on each sign-in: fetch user, then fetch account (using an FK inside the user object that identifies the account).


Is it possible to design a schema that allows one query to fetch both and doesn't require a non-transactional background job to propagate updates? (Transactional batch updates are out of the question, since there's >25 users.) And if not, is the 2-query idea the best / an acceptable method?


I'll focus on one angle in your question - the 2-query idea. In many cases it is indeed an acceptable method, better than the alternatives. In fact in many NoSQL uses, every user-visible request results in significantly more than two database requests. In fact, it is often stated that this is the reason why NoSQL systems care about low tail latencies (ie, even 99th percentile latencies should be low).

You didn't say why you wanted to avoid the 2-query solution. The 2-query implementation you presented has two downsides:

  1. It is more costly: you need to do two queries instead of one, costing (when the reads are shorter than 4 KB) double than a single read.
  2. Latency doubles if you need to do the first query, and only then can do the second query.

There may be tricks you can use to solve both problems, depending on more details of your use case:

For the latency: You didn't say what is a "user id" in your application. If it is some sort of unique numeric identifier, maybe it can be set up such that the account id can be determined from the user id directly, without a table lookup (eg, the first bits of the user id are the account id). If this is the case, you can start both lookups at the same time, and not double the latency. The cost will still be double, but not the latency.

For the cost: If there is a large number of users per account (you said there are more than 25 - I don't know if it's much more or not), it may be useful to cache the Account data, so that not every user lookup will need to read the Account data again - it might often be cached. If Account information rarely changes and consistency of it is not a big deal (I don't know if it is...), you can also get by with doing an "eventual consistency" read for the Account information - which costs half of the regular "consistent" read.

I think the following scheme will be useful for.

  • You will store both account and user records inthe same table
  • You want to get both account metadata and linked users in a single query
PK: account   SK: recordId

=== Account record ===
account: 123512321   recordId: METADATA   attributes: name, environment, ownerId...

=== User record ===
account: 123512321   recordId: USERID#34543543   attributes: name, email, phone...

With this denormalization of the data, you can retrieve both account metadata and related users in a single query. You can also change the account metadata without a need to apply any change to related users.

BONUS : you can also link other types of assets to the account record

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM