简体   繁体   中英

read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

I am trying to parameterize some parts of a SQL Query using the below dictionary:

query_params = dict(
        {'target':'status',
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and %(target)s in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and %(target)s in ('ACT')
                                     order by random() limit 50000);""")

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

However this returns a dataframe with no records at all. I am not sure what the error is since no error is being thrown.

df_data_sample.shape
Out[7]: (0, 1211)

The final PostgreSql query would be:

select *
        from table_name
            where dt = '201805'
            and status in ('NPA')

----------------------------------------------------
union all
----------------------------------------------------
(select *
        from table_name
            where dt  = '201712'
            and status in ('ACT')
            order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.

Below is a small sample of data for replication. The original data has more than a million records and 1211 columns

service_change_3m   service_change_6m   dt  grp_m2          status
0                   -2                  201805  $50-$75     NPA
0                    0                  201805  < $25       NPA
0                   -1                  201805  $175-$200   ACT
0                    0                  201712  $150-$175   ACT
0                    0                  201712  $125-$150   ACT
-1                   1                  201805  $50-$75     NPA

Can someone please help me with this?

UPDATE: Based on suggestion by @shmee.. I am finally using :

target = 'status'
query_params = dict(
        {
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and {0} in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and {0} in ('ACT')
                                     order by random() limit 50000);""").format(target)

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding ( and %(target)s in ('ACT') ) as mentioned in the comments.

This results in your query restricting the result set to records where 'status' in ('ACT') (ie Is the string 'status' an element of a list containing only the string 'ACT'?). This is, of course, false, hence no record gets selected and you get an empty result.

This should work as expected:

import psycopg2.sql

col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
                'date_to':'201805',
                'drform_target':'NPA'
               }

sql_data_sample = """select * 
                     from {0} 
                     where dt = %(date_to)s 
                     and {1} in (%(drform_target)s)
                     ----------------------------------------------------
                     union all
                     ----------------------------------------------------
                     (select * 
                      from {0} 
                      where dt  = %(date_from)s 
                      and {1} in ('ACT') 
                      order by random() limit 50000);"""

sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name), 
                                                  sql.Identifier(col_name))

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM