简体   繁体   中英

Can I write custom query in Google BigQuery Connector for AWS Glue?

I'm creating a Glue ETL job that transfers data from BigQuery to S3. Similar to this example , but with my own dataset.
nb: I use BigQuery Connector for AWS Glue v0.22.0-2 ( link ).

The data in BigQuery is already partitioned by date, and I would like to have every Glue job run fetches a specific date only ( WHERE date =... ) and group them into 1 CSV file output. But I don't find any clue where to insert the custom WHERE query.

In BigQuery source node configuration options, the options are only these:

在此处输入图像描述

Also in the generated script, it uses create_dynamic_frame.from_options which does not accommodate custom query (per documentation ).

# Script generated for node Google BigQuery Connector 0.22.0 for AWS Glue 3.0
GoogleBigQueryConnector0220forAWSGlue30_node1 = (
    glueContext.create_dynamic_frame.from_options(
        connection_type="marketplace.spark",
        connection_options={
            "parentProject": args["BQ_PROJECT"],
            "table": args["BQ_TABLE"],
            "connectionName": args["BQ_CONNECTION_NAME"],
        },
        transformation_ctx="GoogleBigQueryConnector0220forAWSGlue30_node1",
    )
)

So, is there any way I can write a custom query? Or is there any alternative method?

Quoting this AWS sample project , we can use filter in Connection Options:

  • filter – Passes the condition to select the rows to convert. If the table is partitioned, the selection is pushed down and only the rows in the specified partition are transferred to AWS Glue. In all other cases, all data is scanned and the filter is applied in AWS Glue Spark processing, but it still helps limit the amount of memory used in total.

在此处输入图像描述

Example if used in script:

# Script generated for node Google BigQuery Connector 0.22.0 for AWS Glue 3.0
GoogleBigQueryConnector0220forAWSGlue30_node1 = (
    glueContext.create_dynamic_frame.from_options(
        connection_type="marketplace.spark",
        connection_options={
            "parentProject": "...",
            "table": "...",
            "connectionName": "...",
            "filter": "date = 'yyyy-mm-dd'" #put condition here
        },
        transformation_ctx="GoogleBigQueryConnector0220forAWSGlue30_node1",
    )
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM