简体   繁体   中英

Prevent SQL Injection in BigQuery with Python for table name

I have an Airflow DAG which takes an argument from the user for table .

I then use this value in an SQL statement and execute it in BigQuery. I'm worried about exposing myself to SQL Injection.

Here is the code:

sql = f"""
        CREATE OR REPLACE TABLE {PROJECT}.{dataset}.{table} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """

client = bigquery.Client()
query_job = client.query(sql)

Both dataset and table get passed through via airflow but I'm worried someone could pass through something like: random_table; truncate other_tbl; -- random_table; truncate other_tbl; -- random_table; truncate other_tbl; -- as the table argument.

My fear is that the above will create a table called random_table and then truncate an existing table.

Is there a safer way to process these passed through arguments?

I've looked into parameterized queries in BigQuery but these don't work for table names.

You will have to create a table name validator. I think you can safely validate by using just backticks --> ` at the start and at the end of your table name string. It's not a 100% solution but it worked for some of my test scenarios I try. It should work like this:

# validate should look for ` at the beginning and end of your tablename
table_name = validate(f"`{project}.{dataset}.{table}`")

sql = f"""
        CREATE OR REPLACE TABLE {table_name} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """
...

Note : I suggest you to check the following post on medium site to check about bigquery sql injection .

I checked the official documentation about Running parameterized queries , and sadly it only covers the parameterization of variables not tables or other string part of your query.

As a final note, I recommend to open a feature request for BigQuery for this particular scenario.

You should probably look into sanitization/validation of user input in general. This is done before passing the input to the BQ query.

With Python, you could look for malicious strings in the user input - like truncate in your example - or use a regex to filter input that for instance contains -- . Those are just some quick examples. I recommend you do more research on that topic; you will also find quite a few questions on that topic on SE.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM