[英]How to extract the column names from a sql query
I have extracted just the column fields from a query like this 我从这样的查询中仅提取了列字段
query_split = [query[query.find("select") + len("select"): query.find("from")]]
I get a string like this 我得到这样的字符串
query_split = [' service,count(*) as count,round(sum(mrp),2) as sale ']
I want to get a list which looks like this 我想要一个看起来像这样的清单
[' service','count(*) as count','round(sum(mrp),2) as sale']
This is because I want to get the list of column names 这是因为我要获取列名列表
['service','count','sale']
I have tried other methods such as 我尝试了其他方法,例如
for file in reader(query_split):
print(file)
Gives me the output 给我输出
[' service', 'count(*) as count', 'round(sum(mrp)', '2) as sale ']
when i took the test case which uses round(sum(mrp),2) type operation in query the below function failed at that point 当我接受在查询中使用round(sum(mrp),2)类型操作的测试用例时,以下函数此时失败
def get_column_name(query):
"""
Extracts the column name from a sql query
:param query: str
:return: column_name
list: Column names which that query will fetch
"""
column_name=[]
query_split = query[query.find("select") + len("select"): query.find("from")]
for i in query_split.split(','):
if "as" in i:
column_name.append(i.split('as')[-1])
else:
column_name.append(i.split(' ')[-1])
return column_name
Your problem is that the SQL at play here features nested constructs. 您的问题是,此处使用的SQL具有嵌套构造。
The most likely cleanest solution is to have a SQL parser that understands the MySQL dialect. 最可能最干净的解决方案是拥有一个了解MySQL方言的SQL解析器。 Arguably, it can be done most easily with ANTLR;
可以说,使用ANTLR可以最轻松地完成它。 you can find a MySQL grammar here and a quick guide here if you are curious.
您可以在此处找到MySQL语法,并在有好奇的情况下在此处找到快速指南 。
To approach this with regex we need to account for balanced parenthesis with a recursive regex in a match pattern like this: 为了使用正则表达式来解决这个问题,我们需要在匹配模式中使用递归正则表达式解决平衡括号,如下所示:
[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))
Explanation : 说明 :
[^,]+(\\((?>[^()]++|(?1))*+\\))[^,]+
the recursive regex to match pairs of ()
and everything inbetween (including commas) sourounded by a negated character class that matches everything but a comma. [^,]+(\\((?>[^()]++|(?1))*+\\))[^,]+
递归正则表达式可匹配()
对及其之间的所有内容(包括逗号)否定的字符类使人感到震惊,该字符类与除逗号以外的所有字符都匹配。 ([^(),]+(?:,|$))
matches regular columns ([^(),]+(?:,|$))
匹配常规列 Sample Code: 样例代码:
import regex as re
regex = r"[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))"
test_str = "service,count(*) as count,round(sum(mrp),2) as sale,count(*) as count2,round(sum(mrp),2) as sale2"
matches = re.finditer(regex, test_str, re.MULTILINE)
result = [match.group() for match in matches]
Outputs: 输出:
['service,', 'count(*) as count', 'round(sum(mrp),2) as sale', 'count(*) as count2', 'round(sum(mrp),2) as sale2']
Since we are using PCRE regex features you will need to install Python's alternative regex package to run the code. 由于我们使用的是PCRE正则表达式功能,因此您需要安装Python的替代正则表达式包来运行代码。 Good luck.
祝好运。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.