简体   繁体   中英

Retrospective update of data mart records

I have a datamart that holds several billion events records in a BigQuery database. Each record has a unique event Id and includes one property - let's call it property “x” - that is set at creation with a provisional value.

This property value is provisional but might need to be updated at a later stage. Over the next 20 to 90 days various data mining tasks are run that might come up with a new value for property “x”.

What is the best way of making this type of update?

Two ideas I had 1). Moving property “x” out of the event record, adding a new dimension and a many to many join table between the event record and the dimension. That way I would only need to update the join table. My data engineers are worried that this would impact query/reporting performance. 2). Add a new “date created” record to the event table, and change the retrieval key to be combination of event Id plus the most recent created date. That would allow me to update the property “x” by writing a new record that has the same event Id and the new property “x” value and a more recent date created value.

Thoughts?

If updating X is going to be one or two time activity in a year; in that case I will suggest to go with 2nd idea. But if this is something which you would require to do on a regular basis then the first idea is the best approach to solve such situation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM