简体   繁体   English

如何使用Perl从SQL中提取字段名称?

[英]How can I extract field names from SQL with Perl?

I have a series of select statements in a text file and I need to extract the field names from each select query. 我在文本文件中有一系列选择语句,我需要从每个选择查询中提取字段名称。 This would be easy if some of the fields didn't use nested functions like to_char() etc. 如果某些字段不使用诸如to_char()等嵌套函数,这将很容易。

Given select statement fields that could have several nested parenthese like: 给定的select语句字段可能具有多个嵌套括号,例如:

ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name,

Or the simple case of just base_field_name as a field, what would the regex look like in Perl? 还是仅以base_field_name作为字段的简单情况,正则表达式在Perl中会是什么样?

不要尝试编写正则表达式解析器(尽管perl正则表达式可以处理类似的嵌套模式),请使用SQL :: Statement :: Structure

Why not ask the target database itself how it would interpret the queries? 为什么不问目标数据库本身如何解释查询呢?

In perl, one can use the DBI to query the prepared representation of a SQL query. 在perl中,可以使用DBI查询准备好的SQL查询表示形式。 Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API. 有时这是特定于数据库的:某些驱动程序(在perl DBD::名称空间下)支持RDBMS的描述语句的思想,类似于RDBMS的本机C或C ++ API。

It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME . 但是,由于DBI会将结果列的名称放在语句句柄属性NAME ,因此可以一般地完成。 The following, for example, has a good chance of working on any DBI-supported RDBMS: 例如,以下代码很有可能在任何DBI支持的RDBMS上工作:

use strict;
use warnings;
use DBI;

use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';

my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });

my $sth;
while (<>) {
  next unless /^SELECT/i;   # SELECTs only, assume whole query on one line
  chomp;
  my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
  eval {
    $sth = $dbh->prepare($sql);  # some drivers don't know column names
    $sth->execute();             # until after a successful execute()
  };
  print $@, next if $@;  # oops, problem with that one
  print join(', ', @{$sth->{NAME}}), "\n";
}

The XXX ugly! XXX丑陋! bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute() . 有点尝试将始终为false的条件附加到SELECT上,以便当您execute()时,SQL引擎不必执行任何实际工作。 It's a terribly naive approach -- that /\\bWHERE\\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work. 这是一种非常幼稚的方法- /\\bWHERE\\b/i测试无法正确地识别出SQL WHERE子句,而不是简单的正则表达式可以正确地解析出SELECT字段名称-但它可能会起作用。

In a somewhat related problem at the office I used: 在办公室中一个有点相关的问题中,我使用了:

my @SqlKeyWordList = qw/select from where .../; # (1)

my @Candidates =split(/\s/,$SqlSelectQuery);      # (2)

my %FieldHash;                                  # (3)
for my $Word (@Candidates)  { 
   next if grep($word,@SqlKeyWordList);
   $FieldHash($Word)++;
} 

Comments: 评论:

  1. SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). SqlKeyWordList包含所有可能在SQL语句中使用的SQL关键字(我们使用MySQL,有很多SQL方言,选择/构建此列表是可行的,请看下面的评论!)。 If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code). 如果有人决定使用关键字作为字段名称,那么您最终将需要一个正则表达式(更好地重构代码)。
  2. Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. 将SQL语句拆分为单词列表,这是最棘手的部分,将需要tweeking。 For now it uses Perl notion of "space" (=not in word) to split. 现在,它使用Perl的“空间”概念(=不在单词中)进行拆分。
    Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements. 拆分字段列表(选择a,b,c),SQL的“ from”部分在这里可能是适当的,具体取决于您的SQL语句。
  3. %MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2) %MyFieldHash将在每个选择字段中包含一个条目(包括乱码,直到您在(2)中验证了SqlKeyWorkList和正则表达式为止

Beware 谨防

  • there is nothing in this code that could not be done in Python. 这段代码中没有什么是Python无法完成的。
  • your life would be much easier if you can influence the creation of said SQL statements. 如果您可以影响所说的SQL语句的创建,您的生活就会轻松得多。 (eg make sure each field is written to a comment) (例如,确保每个字段都写有注释)
  • there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run). 在这种解析方法中,有很多事情可能会/将要出错,您真的应该通过更改过程来完全回避问题(从长远来看节省时间)。
  • this is the regex we use at the office 这是我们在办公室使用的正则表达式
my @Candidates=split(/[\s
                  \(
                  \)
                  \+
                  \,
                  \*
                 \/
                  \-
                  \n
                  \
                  \=
                  \r
                 ]+/,$SqlSelectQuery
               );

How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting: 如何将每行分割成多个字词(用换行符替换每个括号,逗号和空格),然后进行排序:

perl -ne's/[(), ]/\n/g; print' < textfile | sort -u

You'll end up with a lot of content like: 您最终将获得很多内容,例如:

fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM