简体   繁体   English

使用表名 Python 防止 SQL 在 BigQuery 中注入

[英]Prevent SQL Injection in BigQuery with Python for table name

I have an Airflow DAG which takes an argument from the user for table .我有一个 Airflow DAG,它从用户那里获取table的参数。

I then use this value in an SQL statement and execute it in BigQuery.然后我在 SQL 语句中使用这个值并在 BigQuery 中执行它。 I'm worried about exposing myself to SQL Injection.我担心自己会受到 SQL 注射。

Here is the code:这是代码:

sql = f"""
        CREATE OR REPLACE TABLE {PROJECT}.{dataset}.{table} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """

client = bigquery.Client()
query_job = client.query(sql)

Both dataset and table get passed through via airflow but I'm worried someone could pass through something like: random_table; truncate other_tbl; -- datasettable都通过 airflow 传递,但我担心有人会通过类似: random_table; truncate other_tbl; -- random_table; truncate other_tbl; -- random_table; truncate other_tbl; -- as the table argument. random_table; truncate other_tbl; --作为table参数。

My fear is that the above will create a table called random_table and then truncate an existing table.我担心上面会创建一个名为random_table的表,然后截断现有表。

Is there a safer way to process these passed through arguments?有没有更安全的方法来处理通过 arguments 传递的这些?

I've looked into parameterized queries in BigQuery but these don't work for table names.我研究了 BigQuery 中的参数化查询,但这些查询不适用于表名。

You will have to create a table name validator.您将必须创建一个表名验证器。 I think you can safely validate by using just backticks --> ` at the start and at the end of your table name string.我认为您可以通过在表名字符串的开头和结尾使用backticks --> `来安全地进行验证。 It's not a 100% solution but it worked for some of my test scenarios I try.它不是 100% 的解决方案,但它适用于我尝试的一些测试场景。 It should work like this:它应该像这样工作:

# validate should look for ` at the beginning and end of your tablename
table_name = validate(f"`{project}.{dataset}.{table}`")

sql = f"""
        CREATE OR REPLACE TABLE {table_name} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """
...

Note : I suggest you to check the following post on medium site to check about bigquery sql injection .注意:我建议您在 medium 站点上查看以下帖子以检查bigquery sql 注入

I checked the official documentation about Running parameterized queries , and sadly it only covers the parameterization of variables not tables or other string part of your query.我检查了关于运行参数化查询的官方文档,遗憾的是它只涵盖了变量的参数化而不是表或查询的其他字符串部分。

As a final note, I recommend to open a feature request for BigQuery for this particular scenario.最后一点,我建议为这个特定场景打开 BigQuery 的功能请求

You should probably look into sanitization/validation of user input in general.一般来说,您可能应该研究用户输入的清理/验证。 This is done before passing the input to the BQ query.这是在将输入传递给 BQ 查询之前完成的。

With Python, you could look for malicious strings in the user input - like truncate in your example - or use a regex to filter input that for instance contains -- .使用 Python,您可以在用户输入中查找恶意字符串 - 如示例中的truncate - 或使用正则表达式过滤例如包含--的输入。 Those are just some quick examples.这些只是一些简单的例子。 I recommend you do more research on that topic;我建议您对该主题进行更多研究; you will also find quite a few questions on that topic on SE.您还会在 SE 上找到很多关于该主题的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM