Firestore - 遞歸復制文檔及其所有子集合/文檔

Question

我們使用谷歌的 Firestore 來獲取嵌入式機器配置數據。 因為此數據控制可配置的頁面流和許多其他內容，所以它被分成許多子集合。 每台機器在該系統中都有自己的頂級文檔。 但是，當我們 go 將機器添加到機群時需要很長時間，因為我們必須手動復制多個文檔中的所有這些數據。 go 有誰知道如何在 Python 中遞歸復制 Firestore 文檔、它的所有子集合、它們的文檔、子集合等。您將有一個頂層文檔引用以及新頂層文檔的名稱。

Answer 1

問題要求 Python，但就我而言，我需要在 NodeJS (Typescript)中對 Firestore 文檔/collections 進行遞歸深度復制，並使用文檔作為遞歸的起點。

（這是基於@cristi 的 Python 腳本的解決方案）

Function定義

import {
  CollectionReference,
  DocumentReference,
  DocumentSnapshot,
  QueryDocumentSnapshot,
  WriteBatch,
} from 'firebase-admin/firestore';

interface FirestoreCopyRecursiveContext {
  batchSize: number;
  /**
   * Wrapped Firestore WriteBatch. In firebase-admin@11.0.1, you can't continue
   * using the WriteBatch object after you call WriteBatch.commit().
   * 
   * Hence, we need to replaced "used up" WriteBatch's with new ones.
   * We also need to reset the count after committing, and because we
   * want all recursive invocations to share the same count + WriteBatch instance,
   * we pass this data via object reference.
   */
  writeBatch: {
    writeBatch: WriteBatch,
    /** Num of items in current batch. Reset to 0 when `commitBatch` commits.  */
    count: number;
  };
  /**
   * Function that commits the batch if it reached the limit or is forced to.
   * The WriteBatch instance is automatically replaced with fresh one
   * if commit did happen.
   */
  commitBatch: (force?: boolean) => Promise<void>;
  /** Callback to insert custom logic / write operations when we encounter a document */
  onDocument?: (
    sourceDoc: QueryDocumentSnapshot | DocumentSnapshot,
    targetDocRef: DocumentReference,
    context: FirestoreCopyRecursiveContext
  ) => unknown;
  /** Callback to insert custom logic / write operations when we encounter a collection */
  onCollection?: (
    sourceDoc: CollectionReference,
    targetDocRef: CollectionReference,
    context: FirestoreCopyRecursiveContext
  ) => unknown;
  logger?: Console['info'];
}

type FirestoreCopyRecursiveOptions = Partial<Omit<FirestoreCopyRecursiveContext, 'commitBatch'>>;

/**
 * Copy all data from one document to another, including
 * all subcollections and documents within them, etc.
 */
export const firestoreCopyDocRecursive = async (
  /** Source Firestore Document Snapshot, descendants of which we want to copy */
  sourceDoc: QueryDocumentSnapshot | DocumentSnapshot,
  /** Destination Firestore Document Ref */
  targetDocRef: DocumentReference,
  options?: FirestoreCopyRecursiveOptions,
) => {
  const batchSize = options?.batchSize ?? 500;
  const writeBatchRef = options?.writeBatch || { writeBatch: firebaseFirestore.batch(), count: 0 };
  const onDocument = options?.onDocument;
  const onCollection = options?.onCollection;
  const logger = options?.logger || console.info;

  const commitBatch = async (force?: boolean) => {
    // Commit batch only if size limit hit or forced
    if (writeBatchRef.count < batchSize && !force) return;

    logger(`Commiting ${writeBatchRef.count} batched operations...`);
    await writeBatchRef.writeBatch.commit();
    // Once we commit the batched data, we have to create another WriteBatch,
    // otherwise we get error:
    // "Cannot modify a WriteBatch that has been committed."
    // See https://dev.to/wceolin/cannot-modify-a-writebatch-that-has-been-committed-265f
    writeBatchRef.writeBatch = firebaseFirestore.batch();
    writeBatchRef.count = 0;
  };

  const context = {
    batchSize,
    writeBatch: writeBatchRef,
    onDocument,
    onCollection,
    commitBatch,
  };

  // Copy the contents of the current docs
  const sourceDocData = sourceDoc.data();
  await writeBatchRef.writeBatch.set(targetDocRef, sourceDocData, { merge: false });
  writeBatchRef.count += 1;
  await commitBatch();

  // Allow to make additional changes to the target document from
  // outside the func after copy command is enqueued / commited.
  await onDocument?.(sourceDoc, targetDocRef, context);
  // And try to commit in case user updated the count but forgot to commit
  await commitBatch();

  // Check for subcollections and docs within them
  for (const sourceSubcoll of await sourceDoc.ref.listCollections()) {
    const targetSubcoll = targetDocRef.collection(sourceSubcoll.id);

    // Allow to make additional changes to the target collection from
    // outside the func after copy command is enqueued / commited.
    await onCollection?.(sourceSubcoll, targetSubcoll, context);
    // And try to commit in case user updated the count but forgot to commit
    await commitBatch();

    for (const sourceSubcollDoc of (await sourceSubcoll.get()).docs) {
      const targetSubcollDocRef = targetSubcoll.doc(sourceSubcollDoc.id);
      await firestoreCopyDocRecursive(sourceSubcollDoc, targetSubcollDocRef, context);
    }
  }

  // Commit all remaining operations
  return commitBatch(true);
};

如何使用它

const sourceDocRef = getYourFaveFirestoreDocRef(x);
const sourceDoc = await sourceDocRef.get();
const targetDocRef = getYourFaveFirestoreDocRef(y);

// Copy firestore resources
await firestoreCopyDocRecursive(sourceDoc, targetDocRef, {
  logger,
  // Note: In my case some docs had their doc ID also copied as a field.
  //       Because the copied documents get a new doc ID, we need to update
  //       those fields too.
  onDocument: async (sourceDoc, targetDocRef, context) => {
    const someDocPattern = /^nameOfCollection\/[^/]+?$/;
    const subcollDocPattern = /^nameOfCollection\/[^/]+?\/nameOfSubcoll\/[^/]+?$/;

    // Update the field that holds the document ID
    if (targetDocRef.path.match(someDocPattern)) {
      const docId = targetDocRef.id;
      context.writeBatch.writeBatch.set(targetDocRef, { docId }, { merge: true });
      context.writeBatch.count += 1;
      await context.commitBatch();
      return;
    }

    // In a subcollection, I had to update multiple ID fields
    if (targetDocRef.path.match(subcollDocPattern)) {
      const docId = targetDocRef.parent.parent?.id;
      const subcolDocId = targetDocRef.id;
      context.writeBatch.writeBatch.set(targetDocRef, { docId, subcolDocId }, { merge: true });
      context.writeBatch.count += 1;
      await context.commitBatch();
      return;
    }
  },
});

Answer 2

您可以使用這樣的東西遞歸地從一個集合讀取和寫入另一個集合：

def read_recursive(
    source: firestore.CollectionReference,
    target: firestore.CollectionReference,
    batch: firestore.WriteBatch,
) -> None:
    global batch_nr

    for source_doc_ref in source:
        document_data = source_doc_ref.get().to_dict()
        target_doc_ref = target.document(source_doc_ref.id)
        if batch_nr == 500:
            log.info("commiting %s batched operations..." % batch_nr)
            batch.commit()
            batch_nr = 0
        batch.set(
            reference=target_doc_ref,
            document_data=document_data,
            merge=False,
        )
        batch_nr += 1
        for source_coll_ref in source_doc_ref.collections():
            target_coll_ref = target_doc_ref.collection(source_coll_ref.id)
            read_recursive(
                source=source_coll_ref.list_documents(),
                target=target_coll_ref,
                batch=batch,
            )

batch = db_client.batch()
read_recursive(
    source=db_client.collection("src_collection_name"), 
    target=db_client.collection("target_collection_name"), 
    batch=batch,
)
batch.commit()

寫入是分批的，這樣可以節省很多時間（在我的情況下，它完成的時間是 set 的一半）。

Firestore - 遞歸復制文檔及其所有子集合/文檔

問題描述

2 個解決方案

解決方案1
2 2022-10-04 12:52:51

Function定義

如何使用它

解決方案2
0 2021-06-03 10:22:02

Firestore - 遞歸復制文檔及其所有子集合/文檔

問題描述

2 個解決方案

解決方案1 2 2022-10-04 12:52:51

Function定義

如何使用它

解決方案2 0 2021-06-03 10:22:02

解決方案1
2 2022-10-04 12:52:51

解決方案2
0 2021-06-03 10:22:02