I am using the Firestore BigQuery extension to stream data to Google BigQuery. This data is stored in json format so I think best practice is to generate schema views with this library
When I am now running my BI tool on those views to aggregate and filter some data I see a really poor performance for the resulting queries in BigQuery.
Is there an approach to get this done better? I was thinking to use materialized views but the schema view scripts is already building view upon other views and you can't do that with materialized views. I think I am missing something in my whole setup because I am only talking about collections with a few thousands of records in it
EDIT A real time example from my prod environment
Raw data.table transaction_raw_changelog, coming directly from the Firestore extension
Generating schema views creates 2 views
-- Retrieves the latest document change events for all live documents.
-- timestamp: The Firestore timestamp at which the event took place.
-- operation: One of INSERT, UPDATE, DELETE, IMPORT.
-- event_id: The id of the event that triggered the cloud function mirrored the event.
-- data: A raw JSON payload of the current state of the document.
-- document_id: The document id as defined in the Firestore database
SELECT
document_name,
document_id,
timestamp,
event_id,
operation,
data
FROM
(
SELECT
document_name,
document_id,
FIRST_VALUE(timestamp) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS timestamp,
FIRST_VALUE(event_id) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS event_id,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS operation,
FIRST_VALUE(data) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS data,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) = "DELETE" AS is_deleted
FROM
`swipedrinks-app.transaction.transaction_raw_changelog`
ORDER BY
document_name,
timestamp DESC
)
WHERE
NOT is_deleted
GROUP BY
document_name,
document_id,
timestamp,
event_id,
operation,
data
-- Given a user-defined schema over a raw JSON changelog, returns the
-- schema elements of the latest set of live documents in the collection.
-- timestamp: The Firestore timestamp at which the event took place.
-- operation: One of INSERT, UPDATE, DELETE, IMPORT.
-- event_id: The event that wrote this row.
-- <schema-fields>: This can be one, many, or no typed-columns
-- corresponding to fields defined in the schema.
SELECT
*
EXCEPT
(orderitem)
FROM
(
SELECT
document_name,
document_id,
timestamp,
operation,
amount,
bartenderId,
eventStandId,
event_id,
paymentMethod,
type,
orderitem,
toUserId
FROM
(
SELECT
document_name,
document_id,
FIRST_VALUE(timestamp) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS timestamp,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS operation,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) = "DELETE" AS is_deleted,
`swipedrinks-app.transaction.firestoreNumber`(
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.amount')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
)
) AS amount,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.bartenderId')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS bartenderId,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.eventStandId')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS eventStandId,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.event_id')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS event_id,
`swipedrinks-app.transaction.firestoreNumber`(
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.paymentMethod')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
)
) AS paymentMethod,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.type')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS type,
`swipedrinks-app.transaction.firestoreArray`(
FIRST_VALUE(JSON_EXTRACT(data, '$.order')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
)
) AS orderitem,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.toUserId')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS toUserId
FROM
`swipedrinks-app.transaction.transaction_raw_latest`
)
WHERE
NOT is_deleted
) transaction_raw_latest
LEFT JOIN UNNEST(transaction_raw_latest.orderitem) AS orderitem_member WITH OFFSET orderitem_index
GROUP BY
document_name,
document_id,
timestamp,
operation,
amount,
bartenderId,
eventStandId,
event_id,
paymentMethod,
type,
toUserId,
orderitem_index,
orderitem_member
The view transaction_schema_transaction_schema latest is the view easy to query with all my recent data and columns per collection document property.
I'd like to query for the sum of amounts per transaction per event_id
SELECT sum(amount) FROM `swipedrinks-app.transaction.transaction_schema_transaction_schema_latest`
This query takes around 12s, I have 196697 rows in this table
This may help:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.