What's the most efficient and scalable way to handle filtering and sorting of a large collection in cloud firestore?

Question

I have a large collection where each document is of a substantial size and therefore cannot be embedded into a single document. For my feature, I need to sort the collection using orderBy and filter the collection using "array-contains" and "==" where the sort and filter parameters are provided by the user. I was wondering if there was an efficient way to do this by cacheing documents that had already been fetched in previous queries. That being said, does firebase do any cacheing itself and does it already optimize what I'm trying to do in this case, or is there any custom cacheing/optimization I can do?

This is what my implementation looks like right now. This works fine for now however it's not very scalable as it creates a new realtime listener each time any of the filter/sort state changes. Is there any way can I improve this to minimize the total number of documents read?

useEffect(() => {
    if (!sort || !status || !search) return;

    const unsubscribe = firebase.collection("users")
        .where("searchTerms", "array-contains", search)
        .where("status", "==", status)
        .orderBy(sort)
        .onSnapshot((snapshot) => {
            // update state hook with snapshot data...
        });
    
    return () => unsubscribe();
}, [sort, status, search]);

Thank you for any and all help!

Answer 1

Typically, you will want to use a search indexer to enable functionality like this.

The Firebase documentation recommends using Algolia for full-text search. You want to create a cloud function that indexes the data you want to search on, along with the Firestore document ID. On the frontend, you can use the Algolia API to get search results and then fetch the whole document From Firestore when you need to display it.

Answer 2

An alternative to pagination for "big data" and still supporting complex queries can be done by baking the data for the client into a simplified search collection.

This displays only key information that represents a source document by combining the essential data into a dedicated collection of all the results. Each document can hold up to 1MB of data each and that can equate to roughly 10k-200k entries based on your data size. It does take some time to set up but it has been effective for handling long-lived data within firebase without additional 3rd party solutions.

The key takeaways are as follows:

This is ideal for data that doesn't update too frequently, multiple changes at once can hit the 1 second limit per document.
All search documents contain two properties, a counter to maintain the current entries and an array of strings that represent your essential data.
Each source document needs to maintain a document ID of its entry document for future updates
On update, you find the search index ID, and use arrayUnion and arrayRemove methods, preferably with a transaction and update the source document.
Optionally, you can use the new Bundle Method to bundle this collection with your app

Resources:

What's the most efficient and scalable way to handle filtering and sorting of a large collection in cloud firestore?

Question

2 answers

solution1
3 2021-05-04 03:01:36

solution2
1 2021-05-04 05:10:28

What's the most efficient and scalable way to handle filtering and sorting of a large collection in cloud firestore?

Question

2 answers

solution1 3 2021-05-04 03:01:36

solution2 1 2021-05-04 05:10:28

solution1
3 2021-05-04 03:01:36

solution2
1 2021-05-04 05:10:28