简体   繁体   中英

Python - Generating SQL WHERE/IN clause from string List

I am given a Python List of an arbitrary length and containing arbitrary strings. In particular, it can have strings with embedded single and/or double quotes. I have no control over the input so I have to take what I am given.

For example:

    valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]

    Python shell:
        >>> valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]
        >>>
        >>> valueList
        ["hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon']
        >>>
        >>> valueList[0]
        "hello'world"
        >>>
        >>> valueList[1]
        'foo"bar'
        >>>
        >>> valueList[2]
        'my\'name"is'
        >>>
        >>> valueList[3]
        'see\'you"soon'

From this, I need to generate an SQL string such as:

    "SELECT * FROM myTable as mt
        WHERE mt."colName" IN ("hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon')

Any solution has to work with both SQLite and Postgres.

I have tried to generate the (...) portion of the clause using Python join but that just ends up making one big string with all single quotes escaped. For example:

    Python shell:
        >>> values = "','".join(valueList)
        >>> values
        'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon'

        >>> values = "'" + "','".join(valueList) + "'"
        >>> values
        '\'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon\''

Additional info: The code that I inherited uses SQLAlchemy and Pandas.

        import pandas as pd
        ...cut...cut...cut...
        my_df = pd.read_sql(sql, my_conn);

I do NOT want to use Pandas to do the filtering. In fact, my assigned task is to REMOVE the existing Pandas filtering and replace it with SQL with explicit WHERE/IN filters for speed.

For example, replace this:

    my_df = pd.read_sql("SELECT * FROM myTable", my_conn) <==== can return 10's of thousands of rows
    my_df = my_df[my_df.loc[:, 'colName'].isin(myList)] <==== ends up with a handful of rows

with this:

    my_df = pd.read_sql("SELECT * FROM myTable as mt WHERE mt."colName" IN ("hello'world", 'foo"bar', ...)", my_conn)

SQL injection protection is a plus, but at this point I'll be happy with any solution that works.

Well, based on the SQL specification that defines a string literal as being delimited by single quotes and to include a single quote insde a string literal you have to double it (you can consult the syntax specification of Sqlite and PostgreSQL to see that they comply with that specification) here's my attempt:

value_list = [ "hello'world", 'foo"bar', """my'name"is""", """see'you"soon""" ]
value_list_escaped = [f"""'{x.replace("'", "''")}'""" for x in value_list]
query_template = "SELECT * FROM myTable as mt WHERE mt.colName IN ({})"
query = query_template.format(", ".join(value_list_escaped))
print(query)

Is that what you wanted?

Here are code fragments from a functioning solution to my problem.

This function is very specific to my problem but demonstrates the parameter injection technique. It also demonstrates how to handle SQLite parameter injection vs Postgres parameter injection.

def whereInjection(valueList, sqlDict):
    # sqlDict starts with just a "paramCount" key set to an initial value (typically 0 but could be any number).
    # As this function generates parameter injection strings, it generates a key/value pair for each parameter
    # in the form {"p_#": value} where # in the current "paramCount" and value is the value of the associated parameter.
    #
    # The end result for a valueList containing ["aaa", "bbb", "ccc'ddd", 'eee"fff'] will be:
    #   injStr = "(:p_0, :p_1, :p_2, :p_3)"
    #       Note: For Postgres, it has to be "(%(p_0)s, %(p_1)s, etc.)"
    #   sqlDict = {
    #       "paramCount": 3,
    #       "p_0": "aaa",
    #       "p_1": "bbb",
    #       "p_2": "ccc'ddd",
    #       "p_3": 'eee"fff'
    #   }
    localDebugPrintingEnabled = False

    # take into account whether the item values are presented as a list, tuple, set, single int, single string, etc.
    if isinstance(valueList, list):
        vList = valueList
    elif isinstance(valueList, tuple):
        vList = list(valueList)
    elif isinstance(valueList, set):
        vList = list(valueList);
    elif isinstance(valueList, int) or isinstance(valueList, str):
        vList = [valueList]
    else:
        vList = valueList # unexpected type...

    sz = len(vList)
    pc = sqlDict["paramCount"]
    if (db_type == 'SQLite'):
        injectStr = "(" + ",".join((":p_" + str(i + pc)) for i in range(0, sz)) + ")"
    else: # assume Postgres
        injectStr = "(" + ",".join(("%(p_" + str(i + pc) + ")s") for i in range(0, sz)) + ")"
    valueDict = {('p_' + str(i + pc)): vList[i] for i in range(0, sz)}

    sqlDict.update(valueDict) # add the valueDict just generated
    sqlDict["paramCount"] += sz # update paramCount for all parameters just added

    return injectStr

The invoking code would look like this. This assumes that you know how to create an engine connection to your DB.

sqlDict = {"paramCount": 0} # start with empty dictionary and starting count of 0
sql = """SELECT * FROM myTable as mt WHERE mt."aColName" IN {0}""".format(whereInjection(itemList, sqlDict));
my_df = pd.read_sql(sql, engine_connection, params=sqlDict); # does the actual parameter injection

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM