简体   繁体   English

将数据从 GCS 中的文件加载到 GCP firestore

[英]Loading data from a file in GCS to GCP firestore

I have written a script which loops through each record in the file and does write to the firestore collection.我编写了一个脚本,它循环遍历文件中的每条记录并写入 firestore 集合。

Firestore Schema {COLLECTION.DOCUMENT.SUBCOLLECTION.DOCUMENT.SUBCOLLECTION} Firestore 架构 {COLLECTION.DOCUMENT.SUBCOLLECTION.DOCUMENT.SUBCOLLECTION}

     '{"KEY":"1234","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1234,"SUB_DOC":{"KEY1" : :"VAL1"}}'
     '{"KEY":"1235","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1235,"SUB_DOC":{"KEY1" : :"VAL1"}}'
     '{"KEY":"1236","DATE":"2022-10-10","SUB_COLLECTION":{"KEY":1236,"SUB_DOC":{"KEY1" : :"VAL1"}}'
...

File is read in the below line在下面的行中读取文件

read_file = filename.download_as_string()

converted to a list of strings转换为字符串列表

    fire_client = firestore.Client(project=PROJECT)
    dict_str = read_file.decode("UTF-8");
    dict_str = dict_str.split('\n');
    dict_str = dict_str.split('\n');
        for i in range(0,len(dict_str)-1):
         i = json.loads(dict_str[i])
         doc_ref = fire_client.collection('STATIC_COLLECTION_NAME').document(i['KEY'])
         doc_ref.set({"KEY" : int(i['KEY']), "DATE" : i['DATE']})
         sub_ref = doc_ref.collection('STATIC_SUB_COLLECTION_NAME').document('STATIC_SUB_DOC_NAME')
         sub_ref.set(i['SUB_COLLECTION'])

However, this job is consuming hours to complete a file size of 100 MB.但是,此作业需要数小时才能完成 100 MB 的文件。 Is there a way I could do this using multiple writes at a time, example batch processing of X number of records from the file and write those to X documents and sub-collections in the firestore.有没有一种方法可以一次使用多个写入来做到这一点,例如批处理文件中的 X 条记录并将这些记录写入 firestore 中的 X 文档和子集合。 Finding a way to make this more efficient instead of looping over millions of records, my current script ended up with 503 The datastore operation timed out, or the data was temporarily unavailable.找到一种方法使它更有效而不是循环数百万条记录,我当前的脚本以 503 结束。数据存储操作超时,或者数据暂时不可用。

You'll want to use the bulk_writer to accumulate & send writes to Firestore您需要使用bulk_writer来累积写入并将写入发送到 Firestore

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 gsutil 将特定的扩展文件上传到 gcp gcs - gsutil upload specific extension file to gcp gcs 如何从 GCP 中的 python 脚本将文件上传到 GCS? - How to upload files to GCS from a python script in GCP? 如何将 GCP Firestore 的文档参考功能与 Spring Data FireStore 一起使用? - How to use Document Reference feature of GCP Firestore with Spring Data FireStore? 从 GCS 加载_Csv_data 到 Bigquery - Load_Csv_data from GCS to Bigquery 使用 javascript 将产品数据从 Firestore 加载到网页 - Loading products data from Firestore to Webpage using javascript GCS Bucket 文件路径作为 Cloud 触发的数据流管道的输入 Function - GCP - GCS Bucket file path as input for Dataflow pipeline triggered by Cloud Function - GCP GCP 存储 - 从 mongo db 读取数据并将其写入文件 GCP 存储桶 - GCP storage - read data from mongo db and write it into to file GCP bucket 使用 PIL 模块从 GCS 打开文件 - Using PIL module to open file from GCS 使用 GCP 数据流和 Apache Beam Python SDK 从 GCS 读取速度非常慢 - Incredibly slow read from GCS with GCP Dataflow & Apache Beam Python SDK 获取 firestore 数据时加载 - Loading when fetching firestore data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM