简体   繁体   English

如何从SQL查询中提取列名

[英]How to extract the column names from a sql query

I have extracted just the column fields from a query like this 我从这样的查询中仅提取了列字段

query_split = [query[query.find("select") + len("select"): query.find("from")]]

I get a string like this 我得到这样的字符串

 query_split = [' service,count(*) as count,round(sum(mrp),2) as sale ']

I want to get a list which looks like this 我想要一个看起来像这样的清单

[' service','count(*) as count','round(sum(mrp),2) as sale']

This is because I want to get the list of column names 这是因为我要获取列名列表

['service','count','sale']

I have tried other methods such as 我尝试了其他方法,例如

for file in reader(query_split):
    print(file)

Gives me the output 给我输出

[' service', 'count(*) as count', 'round(sum(mrp)', '2) as sale ']

when i took the test case which uses round(sum(mrp),2) type operation in query the below function failed at that point 当我接受在查询中使用round(sum(mrp),2)类型操作的测试用例时,以下函数此时失败

def get_column_name(query):
    """
    Extracts the column name from a sql query
    :param query: str
    :return: column_name
    list: Column names which that query will fetch
    """
    column_name=[]
    query_split = query[query.find("select") + len("select"): query.find("from")]
    for i in query_split.split(','):

        if "as" in i:
            column_name.append(i.split('as')[-1])
        else:
            column_name.append(i.split(' ')[-1])
    return column_name

Your problem is that the SQL at play here features nested constructs. 您的问题是,此处使用的SQL具有嵌套构造。

The most likely cleanest solution is to have a SQL parser that understands the MySQL dialect. 最可能最干净的解决方案是拥有一个了解MySQL方言的SQL解析器。 Arguably, it can be done most easily with ANTLR; 可以说,使用ANTLR可以最轻松地完成它。 you can find a MySQL grammar here and a quick guide here if you are curious. 您可以在此处找到MySQL语法,有好奇的情况下在此处找到快速指南

To approach this with regex we need to account for balanced parenthesis with a recursive regex in a match pattern like this: 为了使用正则表达式来解决这个问题,我们需要在匹配模式中使用递归正则表达式解决平衡括号,如下所示:

[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))

Explanation : 说明

  • [^,]+(\\((?>[^()]++|(?1))*+\\))[^,]+ the recursive regex to match pairs of () and everything inbetween (including commas) sourounded by a negated character class that matches everything but a comma. [^,]+(\\((?>[^()]++|(?1))*+\\))[^,]+递归正则表达式可匹配()对及其之间的所有内容(包括逗号)否定的字符类使人感到震惊,该字符类与除逗号以外的所有字符都匹配。
  • ([^(),]+(?:,|$)) matches regular columns ([^(),]+(?:,|$))匹配常规列

Demo 演示

Sample Code: 样例代码:

import regex as re
regex = r"[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))"
test_str = "service,count(*) as count,round(sum(mrp),2) as sale,count(*) as count2,round(sum(mrp),2) as sale2"
matches = re.finditer(regex, test_str, re.MULTILINE)
result = [match.group() for match in matches]

Outputs: 输出:

['service,', 'count(*) as count', 'round(sum(mrp),2) as sale', 'count(*) as count2', 'round(sum(mrp),2) as sale2']

Since we are using PCRE regex features you will need to install Python's alternative regex package to run the code. 由于我们使用的是PCRE正则表达式功能,因此您需要安装Python的替代正则表达式包来运行代码。 Good luck. 祝好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM