简体   繁体   English

我将参数作为python Pandas中的dict传递参数后,read_sql查询返回空数据帧

[英]read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

I am trying to parameterize some parts of a SQL Query using the below dictionary: 我正在尝试使用以下字典来参数化SQL查询的某些部分:

query_params = dict(
        {'target':'status',
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and %(target)s in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and %(target)s in ('ACT')
                                     order by random() limit 50000);""")

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

However this returns a dataframe with no records at all. 但是,这将返回一个完全没有记录的数据框。 I am not sure what the error is since no error is being thrown. 我不确定错误是什么,因为没有错误被抛出。

df_data_sample.shape
Out[7]: (0, 1211)

The final PostgreSql query would be: 最终的PostgreSql查询将是:

select *
        from table_name
            where dt = '201805'
            and status in ('NPA')

----------------------------------------------------
union all
----------------------------------------------------
(select *
        from table_name
            where dt  = '201712'
            and status in ('ACT')
            order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.

Below is a small sample of data for replication. 以下是用于复制的一小部分数据示例。 The original data has more than a million records and 1211 columns 原始数据有超过一百万条记录和1211列

service_change_3m   service_change_6m   dt  grp_m2          status
0                   -2                  201805  $50-$75     NPA
0                    0                  201805  < $25       NPA
0                   -1                  201805  $175-$200   ACT
0                    0                  201712  $150-$175   ACT
0                    0                  201712  $125-$150   ACT
-1                   1                  201805  $50-$75     NPA

Can someone please help me with this? 有人可以帮我吗?

UPDATE: Based on suggestion by @shmee.. I am finally using : 更新:根据@shmee的建议。我最终使用:

target = 'status'
query_params = dict(
        {
         'date_from':'201712',
         'date_to':'201805',
         'drform_target':'NPA'
      })

sql_data_sample = str("""select *
                                 from table_name
                                     where dt = %(date_to)s
                                     and {0} in (%(drform_target)s)

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------

                        (select *,
                                 from table_name
                                     where dt  = %(date_from)s
                                     and {0} in ('ACT')
                                     order by random() limit 50000);""").format(target)

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding ( and %(target)s in ('ACT') ) as mentioned in the comments. 是的,我非常有信心,您的问题是由于试图通过注释中提到的参数绑定( and %(target)s in ('ACT')尝试在查询中设置列名而导致的。

This results in your query restricting the result set to records where 'status' in ('ACT') (ie Is the string 'status' an element of a list containing only the string 'ACT'?). 这导致查询将结果集限制为记录'status' in ('ACT')中的'status' in ('ACT')所在的位置(即字符串'status'是仅包含字符串'ACT'的列表的元素)。 This is, of course, false, hence no record gets selected and you get an empty result. 当然,这是错误的,因此不会选择任何记录,并且结果为空。

This should work as expected: 这应该可以正常工作:

import psycopg2.sql

col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
                'date_to':'201805',
                'drform_target':'NPA'
               }

sql_data_sample = """select * 
                     from {0} 
                     where dt = %(date_to)s 
                     and {1} in (%(drform_target)s)
                     ----------------------------------------------------
                     union all
                     ----------------------------------------------------
                     (select * 
                      from {0} 
                      where dt  = %(date_from)s 
                      and {1} in ('ACT') 
                      order by random() limit 50000);"""

sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name), 
                                                  sql.Identifier(col_name))

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM