简体   繁体   English

Select 不同数据集测试时 | 将测试与生产分开

[英]Select different dataset when testing | Separate test from production

This question is partly about how to test external dependencies (aka integration tests) and partly how to implement it with Python for SQL with BigQuery in specific.这个问题部分是关于如何测试外部依赖项(又名集成测试),部分是关于如何使用 Python 来实现它,因为 SQL 具体是 BigQuery。 So answers only about 'This is how you should do integration tests' are very welcome.因此,非常欢迎仅回答“这是您应该如何进行集成测试”的答案。

In my project I have two different datasets在我的项目中,我有两个不同的数据集

'project_1.production.table_1' 'project_1.production.table_1'
'project_1.development.table_1' 'project_1.development.table_1'

When running my tests I would like to call the development environment.在运行我的测试时,我想调用开发环境。 But how to separate it properly from my production code as I don't want to clutter my production code with test(set-up) code.但是如何正确地将它与我的生产代码分开,因为我不想将我的生产代码与测试(设置)代码混淆。

Production code looks like:生产代码如下所示:

def find_data(variable_x: string) -> DataFrame:
    query = '''
    SELECT *
    FROM `project_1.production.table_1` 
    WHERE foo = @variable_x
    '''

    job_config = bigquery.QueryJobConfig(
        query_parameters=[
            bigquery.ScalarQueryParameter(
                name='foo', type_="STRING", value=variable_x
            )
        ]
    )

    df = self.client.query(
        query=query, job_config=job_config).to_dataframe()
    return df

Solution 1: Environment variables for the dataset解决方案 1:数据集的环境变量

The python-dotenv module can be used to differentiate production from development, as I do for some parts of my code. python-dotenv 模块可用于区分生产和开发,就像我对代码的某些部分所做的那样。 The problem is that bigQuery does not allow to parameterize the dataset.问题是 bigQuery 不允许参数化数据集。 (To prevent SQL-injection I think) See running parameterized queries docs (为了防止 SQL 注入,我认为)请参阅运行参数化查询文档

From the docs从文档

Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.参数不能用作标识符、列名、表名或查询的其他部分的替代品。

So having the environment variable as dataset name is not possible.因此,将环境变量作为数据集名称是不可能的。

Solution 2: Environment variable for flow control方案二:流量控制的环境变量

I could add a if production == True evaluation and select the dataset.我可以添加一个 if production == True 评估和 select 数据集。 However this results in test/debug code in my production code.但是,这会导致我的生产代码中的测试/调试代码。 I would like to avoid it as much as possible.我想尽可能地避免它。

from os import getenv

def find_data(variable_x : string) -> Dataframe:
   load_dotenv()
   PRODUCTION = getenv("PRODUCTION")
   if PRODUCTION == TRUE:
       *Execute query on project_1.production.table_1*
   else:
       *Execute query on project_1.development.table_1*
   job_config = (*snip*)
   df = (*snip*)
   return df

Solution 3: Mimic function in testcode解决方案3:在测试代码中模仿function

Make a copy of the production code and set up the test code so that the development dataset is called.制作生产代码的副本并设置测试代码,以便调用开发数据集。

This leads to duplication of code (one in production code and one in test code).这会导致代码重复(一个在生产代码中,一个在测试代码中)。 A result of this duplication will lead to a mismatch of the code may the implementation of the function change over time.这种重复的结果会导致代码的不匹配,可能随着时间的推移function的执行发生变化。 So I think this solution is not 'Embracing Change'所以我认为这个解决方案不是“拥抱变化”

Solution 4: Skip testing this function解决方案 4:跳过测试此 function

Perhaps this function does not need to be called at all in my test code.也许这个 function 在我的测试代码中根本不需要调用。 Just take a snippet of the result of this query and use the result as a 'data injection' into the tests that depend on this result.只需获取此查询结果的片段,并将结果用作“数据注入”到依赖此结果的测试中。 However then I need to adjust my architecture a bit.但是,我需要稍微调整一下我的架构。

The above solutions don't satisfy me completely.上述解决方案并不能完全满足我。 I wonder if there is another way to solve this issue or if one of the above solutions is acceptable?我想知道是否有另一种方法可以解决这个问题,或者上述解决方案之一是否可以接受?

It looks like string formatting (sometimes referred to as string interpolation) might be enough to get you where you want.看起来字符串格式(有时称为字符串插值)可能足以让您到达您想要的位置。 You could replace the first part of your function by the following code:您可以用以下代码替换 function 的第一部分:

query = '''
    SELECT *
    FROM `{table}` 
    WHERE foo = @variable_x
    '''.format(table = getenv("DATA_TABLE"))

This works because the query is just a string and you can do whatever you want with it before you pass it on the the BigQuery library.这是有效的,因为查询只是一个字符串,您可以在将它传递到 BigQuery 库之前对它做任何您想做的事情。 The String.format allows us to replace values inside a string, which is exactly what we need (see this article for a more in depth explanation about String.format ) String.format允许我们替换字符串中的值,这正是我们所需要的(有关String.format的更深入解释,请参阅本文

Important security note : it is in general a bad security practice to manipulate SQL queries as plain strings (as we are doing here), but since you control the environment variables of the application it should be safe in this particular case.重要的安全注意事项:通常将 SQL 查询作为纯字符串操作是一种不好的安全做法(正如我们在此处所做的那样),但由于您控制应用程序的环境变量,因此在这种特殊情况下应该是安全的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我应该在生产和测试等不同阶段之间共享 GCP Artifact Registry 实例吗? - Should I share GCP Artifact Registry instances between different stages like production and test? 使用不同的数据进行生产并开发 firebase 个站点 - Use different data for production and develop firebase sites 将附件与来自 Nodejs 中 Sendgrid 的原始 mime 消息分开 - Separate attachment from Raw mime message from Sendgrid in Nodejs 从多个数据集中获取日期 - Getting dates from multiple dataset 将代码推送到生产环境时保持服务器网关处于活动状态 - Keep server gateway alive when pushing code to production 在本地测试 (Python) Google Cloud Function 时出现应用程序上下文错误 - Application context errors when locally testing a (Python) Google Cloud Function 使用 localstack 运行 SQS 测试时出现 502 - 502 when running SQS test using localstack 从子选择执行 SELECT * 查询时,BigQuery 是否保证列顺序? - Does BigQuery guarantee column order when performing SELECT * query from subselect? 如何从 Python 训练代码中的 Vertex AI 托管数据集中加载图像? - How to load images from Vertex AI managed dataset inside Python training code? 在本地测试 Cloud Functions 时 Cloud Firestore 模拟器未运行 - Cloud Firestore emulator not running when testing Cloud Functions locally
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM