简体   繁体   中英

Filtering SQL query based on parameters with more than one value

I am trying to build a SQL that needs to be filtered by two parameters (2 columns), and the second column needs to match multiple values.

Given below is the SQL I have build thus far (thanks for the help of Martijn Pieters )

import psycopg2
import pandas as pd
import datetime

# Connecting to db

con = psycopg2.connect(db_details)
cur = con.cursor()
cur.execute("select * from sales limit 10")
rows = cur.fetchall()

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
      'store_name': 'store_1', 'store_2'}

df = pd.read_sql("""
     select store_name,count(*) from sales 
     where created_at >= %(earliest)s
     and store_name = %(store_name)s""",
 params=params, con=con)

The above SQL has one date parameter which is used in the where clause and I added one more parameter namely the store_name , where rows match either one of two values.

Would like to know how could I add in this additional parameter to the existing query.

I tried to create a the parameter (similar to the date filter) and pass that to the existing query but get a syntax error when I give it two values:

    'store_name': 'store_1', 'store_2'}
                                      ^
SyntaxError: invalid syntax

pointing to the params field.

You have two problems:

  • You used invalid Python syntax; the comma in a dictionary separates key-value pairs, so the 'store_2' string would be another key-value pair but is missing the : value parts. If you want to define a value with more than one string, you'd have to use a tuple or a list there, were you explicitly use either (...) or [...] to separate that syntax from the key: value, key: value notation:

     params = { 'earliest': datetime.datetime.today() - datetime.timedelta(days=7), 'store_name': ('store_1', 'store_2'), # tuple with two values } 
  • Generally speaking, SQL parameters can only work with single values . The store_name parameter can only be given a single value, not a sequence of values. That's because SQL parameters are a bridge between the SQL query and the dynamic values to be used in that query, with parameters designed to act as placeholder for each individual dynamic value.

    That said, the psycopg2 library specifically supports tuples , this is an exception to most Python database libraries however.

    Next, if you want to filter rows on matching either 'store_1' or 'store_2' , the correct SQL syntax would be to use two store_name = ... tests with OR between them and parentheses around (to keep that part separate from the date test connected with AND to the store name test), or by using store_name IN ('store_1', 'store_2') . An IN test compares a column name against multiple values listed in the (...) parentheses.

Given that you are using psycopg2 here, you can get away with the store_name key referencing a tuple value, but you do need to use IN for your query:

params = {
    'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
    'store_name': ('store_1', 'store_2')
}

df = pd.read_sql("""
     SELECT store_name, count(*) FROM sales 
     WHERE created_at >= %(earliest)s
     AND store_name IN %(store_name)s""",
     params=params, con=con)

On a separate note: the pd.read_sql() function [explicitly states that only sqlite is supported when using a DBAPI connection](If a DBAPI2 object, only sqlite3 is supported):

If a DBAPI2 object, only sqlite3 is supported.

You are using such an object; most Python database adapters are DBAPI2 libraries; DBAPI2 is a Python standard for such libraries .

You should really use a SQLAlchemy connection string instead. Your code happens to work because you never attempt to write any data back to the database and the psycopg connection and cursor objects are largely compatible with the sqlite3 library versions, but you could run into problems down the road.

I don't see why this would not work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_name': '<put what you want here>'}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name = %(store_name)s""",
     params=params, con=con)

Because you want two stores this is a bit more complex.

This should work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_names': ','.join(('store_1', 'store_2'))}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name in (%(store_names)s)""",
     params=params, con=con)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM