繁体   English   中英

Python - 从字符串列表生成 SQL WHERE/IN 子句

[英]Python - Generating SQL WHERE/IN clause from string List

我得到了一个任意长度并包含任意字符串的 Python 列表。 特别是,它可以包含嵌入单引号和/或双引号的字符串。 我无法控制输入,所以我必须接受给定的东西。

例如:

    valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]

    Python shell:
        >>> valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]
        >>>
        >>> valueList
        ["hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon']
        >>>
        >>> valueList[0]
        "hello'world"
        >>>
        >>> valueList[1]
        'foo"bar'
        >>>
        >>> valueList[2]
        'my\'name"is'
        >>>
        >>> valueList[3]
        'see\'you"soon'

由此,我需要生成一个 SQL 字符串,例如:

    "SELECT * FROM myTable as mt
        WHERE mt."colName" IN ("hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon')

任何解决方案都必须与 SQLite 和 Postgres 一起使用。

我尝试使用 Python 连接来生成子句的 (...) 部分,但这最终会产生一个所有单引号都转义的大字符串。 例如:

    Python shell:
        >>> values = "','".join(valueList)
        >>> values
        'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon'

        >>> values = "'" + "','".join(valueList) + "'"
        >>> values
        '\'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon\''

附加信息:我继承的代码使用 SQLAlchemy 和 Pandas。

        import pandas as pd
        ...cut...cut...cut...
        my_df = pd.read_sql(sql, my_conn);

我不想使用 Pandas 进行过滤。 实际上,我分配的任务是删除现有的 Pandas 过滤并将其替换为具有显式 WHERE/IN 过滤器的 SQL 以提高速度。

例如,替换这个:

    my_df = pd.read_sql("SELECT * FROM myTable", my_conn) <==== can return 10's of thousands of rows
    my_df = my_df[my_df.loc[:, 'colName'].isin(myList)] <==== ends up with a handful of rows

有了这个:

    my_df = pd.read_sql("SELECT * FROM myTable as mt WHERE mt."colName" IN ("hello'world", 'foo"bar', ...)", my_conn)

SQL 注入保护是一个优点,但在这一点上,我会对任何可行的解决方案感到满意。

好吧,基于 SQL 规范,该规范将字符串文字定义为由单引号分隔,并且要在字符串文字中包含单引号,您必须将其加倍(您可以查阅SqliteZ399BD1EE587245FECAC6F3的语法规范以查看它们是否符合 9BEAA886该规范)这是我的尝试:

value_list = [ "hello'world", 'foo"bar', """my'name"is""", """see'you"soon""" ]
value_list_escaped = [f"""'{x.replace("'", "''")}'""" for x in value_list]
query_template = "SELECT * FROM myTable as mt WHERE mt.colName IN ({})"
query = query_template.format(", ".join(value_list_escaped))
print(query)

那是你想要的吗?

以下是针对我的问题的有效解决方案的代码片段。

这个 function 对我的问题非常具体,但演示了参数注入技术。 它还演示了如何处理 SQLite 参数注入与 Postgres 参数注入。

def whereInjection(valueList, sqlDict):
    # sqlDict starts with just a "paramCount" key set to an initial value (typically 0 but could be any number).
    # As this function generates parameter injection strings, it generates a key/value pair for each parameter
    # in the form {"p_#": value} where # in the current "paramCount" and value is the value of the associated parameter.
    #
    # The end result for a valueList containing ["aaa", "bbb", "ccc'ddd", 'eee"fff'] will be:
    #   injStr = "(:p_0, :p_1, :p_2, :p_3)"
    #       Note: For Postgres, it has to be "(%(p_0)s, %(p_1)s, etc.)"
    #   sqlDict = {
    #       "paramCount": 3,
    #       "p_0": "aaa",
    #       "p_1": "bbb",
    #       "p_2": "ccc'ddd",
    #       "p_3": 'eee"fff'
    #   }
    localDebugPrintingEnabled = False

    # take into account whether the item values are presented as a list, tuple, set, single int, single string, etc.
    if isinstance(valueList, list):
        vList = valueList
    elif isinstance(valueList, tuple):
        vList = list(valueList)
    elif isinstance(valueList, set):
        vList = list(valueList);
    elif isinstance(valueList, int) or isinstance(valueList, str):
        vList = [valueList]
    else:
        vList = valueList # unexpected type...

    sz = len(vList)
    pc = sqlDict["paramCount"]
    if (db_type == 'SQLite'):
        injectStr = "(" + ",".join((":p_" + str(i + pc)) for i in range(0, sz)) + ")"
    else: # assume Postgres
        injectStr = "(" + ",".join(("%(p_" + str(i + pc) + ")s") for i in range(0, sz)) + ")"
    valueDict = {('p_' + str(i + pc)): vList[i] for i in range(0, sz)}

    sqlDict.update(valueDict) # add the valueDict just generated
    sqlDict["paramCount"] += sz # update paramCount for all parameters just added

    return injectStr

调用代码如下所示。 这假设您知道如何创建与数据库的引擎连接。

sqlDict = {"paramCount": 0} # start with empty dictionary and starting count of 0
sql = """SELECT * FROM myTable as mt WHERE mt."aColName" IN {0}""".format(whereInjection(itemList, sqlDict));
my_df = pd.read_sql(sql, engine_connection, params=sqlDict); # does the actual parameter injection

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM