如何使用Perl从SQL中提取字段名称？

Question

I have a series of select statements in a text file and I need to extract the field names from each select query. 我在文本文件中有一系列选择语句，我需要从每个选择查询中提取字段名称。 This would be easy if some of the fields didn't use nested functions like to_char() etc. 如果某些字段不使用诸如to_char()等嵌套函数，这将很容易。

Given select statement fields that could have several nested parenthese like: 给定的select语句字段可能具有多个嵌套括号，例如：

ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name,

Or the simple case of just base_field_name as a field, what would the regex look like in Perl? 还是仅以base_field_name作为字段的简单情况，正则表达式在Perl中会是什么样？

Answer 1

不要尝试编写正则表达式解析器（尽管perl正则表达式可以处理类似的嵌套模式），请使用SQL :: Statement :: Structure 。

Answer 2

Why not ask the target database itself how it would interpret the queries? 为什么不问目标数据库本身如何解释查询呢？

In perl, one can use the DBI to query the prepared representation of a SQL query. 在perl中，可以使用DBI查询准备好的SQL查询表示形式。 Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API. 有时这是特定于数据库的：某些驱动程序（在perl DBD::名称空间下）支持RDBMS的描述语句的思想，类似于RDBMS的本机C或C ++ API。

It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME . 但是，由于DBI会将结果列的名称放在语句句柄属性NAME ，因此可以一般地完成。 The following, for example, has a good chance of working on any DBI-supported RDBMS: 例如，以下代码很有可能在任何DBI支持的RDBMS上工作：

use strict;
use warnings;
use DBI;

use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';

my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });

my $sth;
while (<>) {
  next unless /^SELECT/i;   # SELECTs only, assume whole query on one line
  chomp;
  my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
  eval {
    $sth = $dbh->prepare($sql);  # some drivers don't know column names
    $sth->execute();             # until after a successful execute()
  };
  print $@, next if $@;  # oops, problem with that one
  print join(', ', @{$sth->{NAME}}), "\n";
}

The XXX ugly! XXX丑陋！ bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute() . 有点尝试将始终为false的条件附加到SELECT上，以便当您execute()时，SQL引擎不必执行任何实际工作。 It's a terribly naive approach -- that /\\bWHERE\\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work. 这是一种非常幼稚的方法- /\\bWHERE\\b/i测试无法正确地识别出SQL WHERE子句，而不是简单的正则表达式可以正确地解析出SELECT字段名称-但它可能会起作用。

Answer 3

In a somewhat related problem at the office I used: 在办公室中一个有点相关的问题中，我使用了：

my @SqlKeyWordList = qw/select from where .../; # (1)

my @Candidates =split(/\s/,$SqlSelectQuery);      # (2)

my %FieldHash;                                  # (3)
for my $Word (@Candidates)  { 
   next if grep($word,@SqlKeyWordList);
   $FieldHash($Word)++;
}

Comments: 评论：

SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). SqlKeyWordList包含所有可能在SQL语句中使用的SQL关键字（我们使用MySQL，有很多SQL方言，选择/构建此列表是可行的，请看下面的评论！）。 If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code). 如果有人决定使用关键字作为字段名称，那么您最终将需要一个正则表达式（更好地重构代码）。
Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. 将SQL语句拆分为单词列表，这是最棘手的部分，将需要tweeking。 For now it uses Perl notion of "space" (=not in word) to split. 现在，它使用Perl的“空间”概念（=不在单词中）进行拆分。
Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements. 拆分字段列表（选择a，b，c），SQL的“ from”部分在这里可能是适当的，具体取决于您的SQL语句。
%MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2) ％MyFieldHash将在每个选择字段中包含一个条目（包括乱码，直到您在（2）中验证了SqlKeyWorkList和正则表达式为止

Beware 谨防

there is nothing in this code that could not be done in Python. 这段代码中没有什么是Python无法完成的。
your life would be much easier if you can influence the creation of said SQL statements. 如果您可以影响所说的SQL语句的创建，您的生活就会轻松得多。 (eg make sure each field is written to a comment) （例如，确保每个字段都写有注释）
there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run). 在这种解析方法中，有很多事情可能会/将要出错，您真的应该通过更改过程来完全回避问题（从长远来看节省时间）。
this is the regex we use at the office 这是我们在办公室使用的正则表达式

my @Candidates=split(/[\s
                  \(
                  \)
                  \+
                  \,
                  \*
                 \/
                  \-
                  \n
                  \
                  \=
                  \r
                 ]+/,$SqlSelectQuery
               );

Answer 4

How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting: 如何将每行分割成多个字词（用换行符替换每个括号，逗号和空格），然后进行排序：

perl -ne's/[(), ]/\n/g; print' < textfile | sort -u

You'll end up with a lot of content like: 您最终将获得很多内容，例如：

fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char

如何使用Perl从SQL中提取字段名称？

问题描述

4 个解决方案

解决方案1
11 2010-02-01 02:05:26

解决方案2
2 2010-02-01 02:54:56

解决方案3
1 2010-02-01 05:39:34

Beware 谨防

解决方案4
0 2010-02-01 00:47:36

如何使用Perl从SQL中提取字段名称？

问题描述

4 个解决方案

解决方案1 11 2010-02-01 02:05:26

解决方案2 2 2010-02-01 02:54:56

解决方案3 1 2010-02-01 05:39:34

Beware 谨防

解决方案4 0 2010-02-01 00:47:36

解决方案1
11 2010-02-01 02:05:26

解决方案2
2 2010-02-01 02:54:56

解决方案3
1 2010-02-01 05:39:34

解决方案4
0 2010-02-01 00:47:36