[英]Loading data from a file in GCS to GCP firestore
I have written a script which loops through each record in the file and does write to the firestore collection.我编写了一个脚本,它循环遍历文件中的每条记录并写入 firestore 集合。
Firestore Schema {COLLECTION.DOCUMENT.SUBCOLLECTION.DOCUMENT.SUBCOLLECTION} Firestore 架构 {COLLECTION.DOCUMENT.SUBCOLLECTION.DOCUMENT.SUBCOLLECTION}
'{"KEY":"1234","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1234,"SUB_DOC":{"KEY1" : :"VAL1"}}'
'{"KEY":"1235","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1235,"SUB_DOC":{"KEY1" : :"VAL1"}}'
'{"KEY":"1236","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1236,"SUB_DOC":{"KEY1" : :"VAL1"}}'
...
read_file = filename.download_as_string()
fire_client = firestore.Client(project=PROJECT)
dict_str = read_file.decode("UTF-8");
dict_str = dict_str.split('\n');
dict_str = dict_str.split('\n');
for i in range(0,len(dict_str)-1):
i = json.loads(dict_str[i])
doc_ref = fire_client.collection('STATIC_COLLECTION_NAME').document(i['KEY'])
doc_ref.set({"KEY" : int(i['KEY']), "DATE" : i['DATE']})
sub_ref = doc_ref.collection('STATIC_SUB_COLLECTION_NAME').document('STATIC_SUB_DOC_NAME')
sub_ref.set(i['SUB_COLLECTION'])
However, this job is consuming hours to complete a file size of 100 MB.但是,此作业需要数小时才能完成 100 MB 的文件。 Is there a way I could do this using multiple writes at a time, example batch processing of X number of records from the file and write those to X documents and sub-collections in the firestore.有没有一种方法可以一次使用多个写入来做到这一点,例如批处理文件中的 X 条记录并将这些记录写入 firestore 中的 X 文档和子集合。 Finding a way to make this more efficient instead of looping over millions of records, my current script ended up with 503 The datastore operation timed out, or the data was temporarily unavailable.找到一种方法使它更有效而不是循环数百万条记录,我当前的脚本以 503 结束。数据存储操作超时,或者数据暂时不可用。
You'll want to use the bulk_writer to accumulate & send writes to Firestore您需要使用bulk_writer来累积写入并将写入发送到 Firestore
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.