简体   繁体   English

Regex在Delphi XE中命名了捕获组

[英]Regex named capture groups in Delphi XE

I have built a match pattern in RegexBuddy which behaves exactly as I expect. 我在RegexBuddy中构建了一个匹配模式,其行为完全符合我的预期。 But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx. 但我无法将其转移到Delphi XE,至少在使用最新的内置TRegEx或TPerlRegEx时。

My real world code have 6 capture group but I can illustrate the problem in an easier example. 我的真实世界代码有6个捕获组,但我可以用一个更简单的例子来说明问题。 This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog. 此代码在第一个对话框中给出“3”,然后在执行第二个对话框时引发异常(-7索引越界)。

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['time'].Value);
end;

But if I use only one capture group 但是,如果我只使用一个捕获组

Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');

The first dialog shows "2" and the second dialog will show the time "00:00" as expected. 第一个对话框显示“2”,第二个对话框将按预期显示时间“00:00”。

However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime". 但是,如果只允许一个命名的捕获组,这将有点限制,但事实并非如此......如果我将捕获组名称更改为例如“atime”。

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['atime'].Value);
end;

I'll get "3" and "00:00", just as expected. 正如预期的那样,我会得到“3”和“00:00”。 Is there reserved words I cannot use? 有保留的话我不能用吗? I don't think so because in my real example I've tried completely random names. 我不这么认为,因为在我的真实例子中,我尝试过完全随机的名字。 I just cannot figure out what causes this behaviour. 我只是无法弄清楚导致这种行为的原因。

When pcre_get_stringnumber does not find the name, PCRE_ERROR_NOSUBSTRING is returned. pcre_get_stringnumber找不到名称时,返回PCRE_ERROR_NOSUBSTRING

PCRE_ERROR_NOSUBSTRING is defined in RegularExpressionsAPI as PCRE_ERROR_NOSUBSTRING = -7 . PCRE_ERROR_NOSUBSTRING在RegularExpressionsAPI中定义为PCRE_ERROR_NOSUBSTRING = -7

Some testing shows that pcre_get_stringnumber returns PCRE_ERROR_NOSUBSTRING for every name that has the first letter in the range of k to z and that range is dependent of the first letter in judge . 一些测试表明, pcre_get_stringnumber返回PCRE_ERROR_NOSUBSTRING对已范围内的第一个字母每名kz和范围取决于在第一个字母的judge Changing judge to something else changes the range. judge改为别的东西会改变范围。

As i see it there is at lest two bugs involved here. 我看到它至少有两个错误。 One in pcre_get_stringnumber and one in TGroupCollection.GetItem that needs to raise a proper exception instead of SRegExIndexOutOfBounds pcre_get_stringnumber一个和TGroupCollection.GetItem中的一个需要引发正确的异常而不是SRegExIndexOutOfBounds

The bug seems to be in the RegularExpressionsAPI unit that wraps the PCRE library, or in the PCRE OBJ files that it links. 该错误似乎在包含PCRE库的RegularExpressionsAPI单元中,或者在它链接的PCRE OBJ文件中。 If I run this code: 如果我运行此代码:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils, RegularExpressionsAPI;

var
  myregexp: Pointer;
  Error: PAnsiChar;
  ErrorOffset: Integer;
  Offsets: array[0..300] of Integer;
  OffsetCount, Group: Integer;

begin
  try
    myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
    if (myregexp <> nil) then begin
      offsetcount := pcre_exec(myregexp, nil, '00:00  X1 90  55KENNY BENNY', Length('00:00  X1 90  55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
      if (offsetcount > 0) then begin
        Group := pcre_get_stringnumber(myregexp, 'time');
        WriteLn(Group);
        Group := pcre_get_stringnumber(myregexp, 'judge');
        WriteLn(Group);
      end;
    end;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

It prints -7 and 2 instead of 1 and 2. 它打印-7和2而不是1和2。

If I remove RegularExpressionsAPI from the uses clause and add the pcre unit from my TPerlRegEx component , then it does correctly print 1 and 2. 如果我从uses子句中删除RegularExpressionsAPI并从我的TPerlRegEx组件添加pcre单元,那么它会正确打印1和2。

The RegularExpressionsAPI in Delphi XE is based on my pcre unit, and the RegularExpressionsCore unit is based on my PerlRegEx unit. Delphi XE中的RegularExpressionsAPI基于我的pcre单元, RegularExpressionsCore单元基于我的PerlRegEx单元。 Embarcadero did make some changes to both units. Embarcadero确实对这两个单位做了一些改变。 They also compiled their own OBJ files from the PCRE library that are linked by RegularExpressionsAPI . 他们还从PCRE库中编译了自己的OBJ文件,这些文件由RegularExpressionsAPI链接。

I have reported this bug as QC 92497 我已将此错误报告为QC 92497

I have also created a separate report QC 92498 to request that TGroupCollection.GetItem raise a more sensible exception when requesting a named group that does not exist. 我还创建了一个单独的报告QC 92498,以请求TGroupCollection.GetItem在请求不存在的命名组时引发更明智的异常。 (This code is in the RegularExpressions unit which is based on code written by Vincent Parrett, not myself.) (此代码位于RegularExpressions单元中,该单元基于Vincent Parrett编写的代码,而不是我自己。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM