[英]Python - Generating SQL WHERE/IN clause from string List
我得到了一个任意长度并包含任意字符串的 Python 列表。 特别是,它可以包含嵌入单引号和/或双引号的字符串。 我无法控制输入,所以我必须接受给定的东西。
例如:
valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]
Python shell:
>>> valueList = [ "hello'world", 'foo"bar', 'my\'name"is', "see\'you\"soon" ]
>>>
>>> valueList
["hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon']
>>>
>>> valueList[0]
"hello'world"
>>>
>>> valueList[1]
'foo"bar'
>>>
>>> valueList[2]
'my\'name"is'
>>>
>>> valueList[3]
'see\'you"soon'
由此,我需要生成一个 SQL 字符串,例如:
"SELECT * FROM myTable as mt
WHERE mt."colName" IN ("hello'world", 'foo"bar', 'my\'name"is', 'see\'you"soon')
任何解决方案都必须与 SQLite 和 Postgres 一起使用。
我尝试使用 Python 连接来生成子句的 (...) 部分,但这最终会产生一个所有单引号都转义的大字符串。 例如:
Python shell:
>>> values = "','".join(valueList)
>>> values
'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon'
>>> values = "'" + "','".join(valueList) + "'"
>>> values
'\'hello\'world\',\'foo"bar\',\'my\'name"is\',\'see\'you"soon\''
附加信息:我继承的代码使用 SQLAlchemy 和 Pandas。
import pandas as pd
...cut...cut...cut...
my_df = pd.read_sql(sql, my_conn);
我不想使用 Pandas 进行过滤。 实际上,我分配的任务是删除现有的 Pandas 过滤并将其替换为具有显式 WHERE/IN 过滤器的 SQL 以提高速度。
例如,替换这个:
my_df = pd.read_sql("SELECT * FROM myTable", my_conn) <==== can return 10's of thousands of rows
my_df = my_df[my_df.loc[:, 'colName'].isin(myList)] <==== ends up with a handful of rows
有了这个:
my_df = pd.read_sql("SELECT * FROM myTable as mt WHERE mt."colName" IN ("hello'world", 'foo"bar', ...)", my_conn)
SQL 注入保护是一个优点,但在这一点上,我会对任何可行的解决方案感到满意。
好吧,基于 SQL 规范,该规范将字符串文字定义为由单引号分隔,并且要在字符串文字中包含单引号,您必须将其加倍(您可以查阅Sqlite和Z399BD1EE587245FECAC6F3的语法规范以查看它们是否符合 9BEAA886该规范)这是我的尝试:
value_list = [ "hello'world", 'foo"bar', """my'name"is""", """see'you"soon""" ]
value_list_escaped = [f"""'{x.replace("'", "''")}'""" for x in value_list]
query_template = "SELECT * FROM myTable as mt WHERE mt.colName IN ({})"
query = query_template.format(", ".join(value_list_escaped))
print(query)
那是你想要的吗?
以下是针对我的问题的有效解决方案的代码片段。
这个 function 对我的问题非常具体,但演示了参数注入技术。 它还演示了如何处理 SQLite 参数注入与 Postgres 参数注入。
def whereInjection(valueList, sqlDict):
# sqlDict starts with just a "paramCount" key set to an initial value (typically 0 but could be any number).
# As this function generates parameter injection strings, it generates a key/value pair for each parameter
# in the form {"p_#": value} where # in the current "paramCount" and value is the value of the associated parameter.
#
# The end result for a valueList containing ["aaa", "bbb", "ccc'ddd", 'eee"fff'] will be:
# injStr = "(:p_0, :p_1, :p_2, :p_3)"
# Note: For Postgres, it has to be "(%(p_0)s, %(p_1)s, etc.)"
# sqlDict = {
# "paramCount": 3,
# "p_0": "aaa",
# "p_1": "bbb",
# "p_2": "ccc'ddd",
# "p_3": 'eee"fff'
# }
localDebugPrintingEnabled = False
# take into account whether the item values are presented as a list, tuple, set, single int, single string, etc.
if isinstance(valueList, list):
vList = valueList
elif isinstance(valueList, tuple):
vList = list(valueList)
elif isinstance(valueList, set):
vList = list(valueList);
elif isinstance(valueList, int) or isinstance(valueList, str):
vList = [valueList]
else:
vList = valueList # unexpected type...
sz = len(vList)
pc = sqlDict["paramCount"]
if (db_type == 'SQLite'):
injectStr = "(" + ",".join((":p_" + str(i + pc)) for i in range(0, sz)) + ")"
else: # assume Postgres
injectStr = "(" + ",".join(("%(p_" + str(i + pc) + ")s") for i in range(0, sz)) + ")"
valueDict = {('p_' + str(i + pc)): vList[i] for i in range(0, sz)}
sqlDict.update(valueDict) # add the valueDict just generated
sqlDict["paramCount"] += sz # update paramCount for all parameters just added
return injectStr
调用代码如下所示。 这假设您知道如何创建与数据库的引擎连接。
sqlDict = {"paramCount": 0} # start with empty dictionary and starting count of 0
sql = """SELECT * FROM myTable as mt WHERE mt."aColName" IN {0}""".format(whereInjection(itemList, sqlDict));
my_df = pd.read_sql(sql, engine_connection, params=sqlDict); # does the actual parameter injection
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.