简体   繁体   中英

How can I pass a Python parameter in config.py to .sql file?

I am using Python Snowflake connector to extract data from tables in Snowflake. Here is my file structure:

sql
   a.sql
   b.sql
   c.sql
configurations.py
data_extract.py
main.py

Here the sql folder contains all my sql queries in .sql files. I put these sql files separately because they are handreds of lines long each and looks messy if I put them into python files. configuration.py contains datetime parameters I want to change every time I run the code. It looks like this:

START_TIME = '2018-10-01 00:00:00'
END_TIME = '2019-04-01 00:00:00'

I want to add these parameters into the .sql files. For example, a.sql includes the following content:

DECLARE
  @START_PICKUP_DATE DATE,
  @END_PICKUP_DATE DATE,

SET
  @START_PICKUP_DATE = '2018-10-01'

SET
  @END_PICKUP_DATE = '2019-04-01'

select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= START_PICKUP_DATE and pickup_datetime < END_PICKUP_DATE
      and supplier_confirmation_id is not null;

I use a.sql in my python code in the following way:

def executeSQLScriptsFromFile(filepath):
    # snowflake credentials, replace SECRET with your own
    ctx = snowflake.connector.connect(
        user='S_ANALYTICS_USER',
        account=SECRET_A,
        region='us-east-1',
        warehouse=SECRET_B,
        database=SECRET_C,
        role=SECRET_D,
        password=SECRET_E)

    fd = open(filepath, 'r')
    query = fd.read()
    fd.close()

    cs = ctx.cursor()
    try:
        cur = cs.execute(query)
        df = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
    finally:
        cs.close()
    ctx.close()

    return df

def extract_data():
    a_sqlpath = os.path.join(os.getcwd(), 'sql\a.sql')
    a_df = executeSQLScriptsFromFile(a_sqlpath)
    return a_df

The problem is I want START_PICKUP_DATE and END_PICKUP_DATE in a.sql file to be synced and equal to START_TIME and END_TIME in configurations.py file so that I only need to change START_TIME and END_TIME in configurations.py and extract data in different timeframe using a.sql in Snowflake.

I've been looking for solutions online for quite a long time, but still not able to find a good solution that is specific to my problem. Many thanks to anyone who can provide a hint!

To accomplish this, I would take your .sql files and extract the queries into triple-quoted python strings with format specifiers for your variables. Then import the queries into your main script just like you import your configuration:

sql_queries.py:

sql_a = """
DECLARE
  @START_PICKUP_DATE DATE,
  @END_PICKUP_DATE DATE,

SET
  @START_PICKUP_DATE = {START_TIME}

SET
  @END_PICKUP_DATE = {END_TIME}

select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= START_PICKUP_DATE and pickup_datetime < END_PICKUP_DATE
  and supplier_confirmation_id is not null;
"""

main:
from sql_queries import sql_a

print(sql_a.format(configuration.START_TIME, configuration.END_TIME))

You should be able to parameterize the sql statements so that instead of declaring in the SQL file you can just make it a parameter passed during execution.

select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= %(START_PICKUP_DATE)s and pickup_datetime < %(END_PICKUP_DATE)s and supplier_confirmation_id is not null;

Then when calling the function, just send the parameters START_PICKUP_DATE and END_PICKUP_DATE as parameters to the execute statement. One way to do this is to do a mapping from the parameter name to the value of the parameter. (In this example I'm assuming you have a function that will get the parameter value).

cur = cs.execute(query, {'START_PICKUP_DATE':get_value_from_config('start_pickup'), 'END_PICKUP_DATE':get_value_from_config('end_pickup')})

Or you can pass them by location

cur = cs.execute(query, [get_value_from_config('start_pickup'), get_value_from_config('end_pickup')])

Which in essense becomes

cur = cs.execute(query, ['2018-10-01 00:00:00','2019-04-01 00:00:00'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM