简体   繁体   中英

Google Data Engineering Exam Sample Question

I'm not sure about the question and want to get some clarification on that.

You are using Cloud Pub/Sub to stream inventory updates from many point-of-sale (POS) terminals into BigQuery. Each update event has the following information: product identifier “prodSku”, change increment “quantityDelta”, POS identification “termId”, and “messageId” which is created for each push attempt from the terminal. During a network outage, you discovered that duplicated messages were sent, causing the inventory system to over-count the changes. You determine that the terminal application has design problems and may send the same event more than once during push retries. You want to ensure that the inventory update is accurate. What should you do?

A. Inspect the “publishTime” of each message. Make sure that messages whose “publishTime” values match rows in the BigQuery table are discarded.

B. Inspect the “messageId” of each message. Make sure that any messages whose “messageId” values match corresponding rows in the BigQuery table are discarded.

C. Instead of specifying a change increment for “quantityDelta”, always use the derived inventory value after the increment has been applied. Name the new attribute “adjustedQuantity”.

D. Add another attribute orderId to the message payload to mark the unique check-out order across all terminals. Make sure that messages whose “orderId” and “prodSku” values match corresponding rows in the BigQuery table are discarded.

I chose B because I'm assuming that messageID is unique in a way. However, the answer is D. I'm not sure why. Can anyone clarify me on that? Thanks.

For this option:

B. Inspect the “messageId” of each message. Make sure that any messages whose “messageId” values match corresponding rows in the BigQuery table are discarded.

“messageId” which is created for each push attempt from the terminal. This means messageId is nothing but an event attribute generated while push. At the time of push in non-availability, pub-sub may retry the message until it reaches server with new ID every time. Also, if someone bought an item 2 times because he may have forgot to buy 2 quantity, would make an entirely different order and can avail false entry.

For Option D,

D. Add another attribute orderId to the message payload to mark the unique check-out order across all terminals. Make sure that messages whose “orderId” and “prodSku” values match corresponding rows in the BigQuery table are discarded.

orderId provides greater control as this is not an event based attribute like message ID. So even if some has bought 2 quantities in different order, we have the idea of non-failure. While if a push retry happens, duplicate orderID can be detected with productSKU.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM