如何从SQL查询中提取列名

Question

I have extracted just the column fields from a query like this 我从这样的查询中仅提取了列字段

query_split = [query[query.find("select") + len("select"): query.find("from")]]

I get a string like this 我得到这样的字符串

 query_split = [' service,count(*) as count,round(sum(mrp),2) as sale ']

I want to get a list which looks like this 我想要一个看起来像这样的清单

[' service','count(*) as count','round(sum(mrp),2) as sale']

This is because I want to get the list of column names 这是因为我要获取列名列表

['service','count','sale']

I have tried other methods such as 我尝试了其他方法，例如

for file in reader(query_split):
    print(file)

Gives me the output 给我输出

[' service', 'count(*) as count', 'round(sum(mrp)', '2) as sale ']

when i took the test case which uses round(sum(mrp),2) type operation in query the below function failed at that point 当我接受在查询中使用round（sum（mrp），2）类型操作的测试用例时，以下函数此时失败

def get_column_name(query):
    """
    Extracts the column name from a sql query
    :param query: str
    :return: column_name
    list: Column names which that query will fetch
    """
    column_name=[]
    query_split = query[query.find("select") + len("select"): query.find("from")]
    for i in query_split.split(','):

        if "as" in i:
            column_name.append(i.split('as')[-1])
        else:
            column_name.append(i.split(' ')[-1])
    return column_name

Answer 1

Your problem is that the SQL at play here features nested constructs. 您的问题是，此处使用的SQL具有嵌套构造。

The most likely cleanest solution is to have a SQL parser that understands the MySQL dialect. 最可能最干净的解决方案是拥有一个了解MySQL方言的SQL解析器。 Arguably, it can be done most easily with ANTLR; 可以说，使用ANTLR可以最轻松地完成它。 you can find a MySQL grammar here and a quick guide here if you are curious. 您可以在此处找到MySQL语法，并在有好奇的情况下在此处找到快速指南。

To approach this with regex we need to account for balanced parenthesis with a recursive regex in a match pattern like this: 为了使用正则表达式来解决这个问题，我们需要在匹配模式中使用递归正则表达式解决平衡括号，如下所示：

[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))

Explanation : 说明：

[^,]+(\$(?>[^()]++|(?1))*+\$)[^,]+ the recursive regex to match pairs of () and everything inbetween (including commas) sourounded by a negated character class that matches everything but a comma. [^,]+(\$(?>[^()]++|(?1))*+\$)[^,]+递归正则表达式可匹配()对及其之间的所有内容（包括逗号）否定的字符类使人感到震惊，该字符类与除逗号以外的所有字符都匹配。
([^(),]+(?:,|$)) matches regular columns ([^(),]+(?:,|$))匹配常规列

Demo 演示

Sample Code: 样例代码：

import regex as re
regex = r"[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))"
test_str = "service,count(*) as count,round(sum(mrp),2) as sale,count(*) as count2,round(sum(mrp),2) as sale2"
matches = re.finditer(regex, test_str, re.MULTILINE)
result = [match.group() for match in matches]

Outputs: 输出：

['service,', 'count(*) as count', 'round(sum(mrp),2) as sale', 'count(*) as count2', 'round(sum(mrp),2) as sale2']

Since we are using PCRE regex features you will need to install Python's alternative regex package to run the code. 由于我们使用的是PCRE正则表达式功能，因此您需要安装Python的替代正则表达式包来运行代码。 Good luck. 祝好运。

如何从SQL查询中提取列名

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-06-28 08:22:03

如何从SQL查询中提取列名

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-06-28 08:22:03

解决方案1
1 已采纳 2018-06-28 08:22:03