[英]Regex named capture groups in Delphi XE
I have built a match pattern in RegexBuddy which behaves exactly as I expect. 我在RegexBuddy中构建了一个匹配模式,其行为完全符合我的预期。 But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx.
但我无法将其转移到Delphi XE,至少在使用最新的内置TRegEx或TPerlRegEx时。
My real world code have 6 capture group but I can illustrate the problem in an easier example. 我的真实世界代码有6个捕获组,但我可以用一个更简单的例子来说明问题。 This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog.
此代码在第一个对话框中给出“3”,然后在执行第二个对话框时引发异常(-7索引越界)。
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['time'].Value);
end;
But if I use only one capture group 但是,如果我只使用一个捕获组
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');
The first dialog shows "2" and the second dialog will show the time "00:00" as expected. 第一个对话框显示“2”,第二个对话框将按预期显示时间“00:00”。
However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime". 但是,如果只允许一个命名的捕获组,这将有点限制,但事实并非如此......如果我将捕获组名称更改为例如“atime”。
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['atime'].Value);
end;
I'll get "3" and "00:00", just as expected. 正如预期的那样,我会得到“3”和“00:00”。 Is there reserved words I cannot use?
有保留的话我不能用吗? I don't think so because in my real example I've tried completely random names.
我不这么认为,因为在我的真实例子中,我尝试过完全随机的名字。 I just cannot figure out what causes this behaviour.
我只是无法弄清楚导致这种行为的原因。
When pcre_get_stringnumber does not find the name, PCRE_ERROR_NOSUBSTRING
is returned. 当pcre_get_stringnumber找不到名称时,返回
PCRE_ERROR_NOSUBSTRING
。
PCRE_ERROR_NOSUBSTRING
is defined in RegularExpressionsAPI as PCRE_ERROR_NOSUBSTRING = -7
. PCRE_ERROR_NOSUBSTRING
在RegularExpressionsAPI中定义为PCRE_ERROR_NOSUBSTRING = -7
。
Some testing shows that pcre_get_stringnumber
returns PCRE_ERROR_NOSUBSTRING
for every name that has the first letter in the range of k
to z
and that range is dependent of the first letter in judge
. 一些测试表明,
pcre_get_stringnumber
返回PCRE_ERROR_NOSUBSTRING
对已范围内的第一个字母每名k
到z
和范围取决于在第一个字母的judge
。 Changing judge
to something else changes the range. 将
judge
改为别的东西会改变范围。
As i see it there is at lest two bugs involved here. 我看到它至少有两个错误。 One in
pcre_get_stringnumber
and one in TGroupCollection.GetItem that needs to raise a proper exception instead of SRegExIndexOutOfBounds
pcre_get_stringnumber
一个和TGroupCollection.GetItem中的一个需要引发正确的异常而不是SRegExIndexOutOfBounds
The bug seems to be in the RegularExpressionsAPI
unit that wraps the PCRE library, or in the PCRE OBJ files that it links. 该错误似乎在包含PCRE库的
RegularExpressionsAPI
单元中,或者在它链接的PCRE OBJ文件中。 If I run this code: 如果我运行此代码:
program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils, RegularExpressionsAPI;
var
myregexp: Pointer;
Error: PAnsiChar;
ErrorOffset: Integer;
Offsets: array[0..300] of Integer;
OffsetCount, Group: Integer;
begin
try
myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
if (myregexp <> nil) then begin
offsetcount := pcre_exec(myregexp, nil, '00:00 X1 90 55KENNY BENNY', Length('00:00 X1 90 55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
if (offsetcount > 0) then begin
Group := pcre_get_stringnumber(myregexp, 'time');
WriteLn(Group);
Group := pcre_get_stringnumber(myregexp, 'judge');
WriteLn(Group);
end;
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn;
end.
It prints -7 and 2 instead of 1 and 2. 它打印-7和2而不是1和2。
If I remove RegularExpressionsAPI from the uses
clause and add the pcre
unit from my TPerlRegEx component , then it does correctly print 1 and 2. 如果我从
uses
子句中删除RegularExpressionsAPI并从我的TPerlRegEx组件添加pcre
单元,那么它会正确打印1和2。
The RegularExpressionsAPI
in Delphi XE is based on my pcre
unit, and the RegularExpressionsCore
unit is based on my PerlRegEx
unit. Delphi XE中的
RegularExpressionsAPI
基于我的pcre
单元, RegularExpressionsCore
单元基于我的PerlRegEx
单元。 Embarcadero did make some changes to both units. Embarcadero确实对这两个单位做了一些改变。 They also compiled their own OBJ files from the PCRE library that are linked by
RegularExpressionsAPI
. 他们还从PCRE库中编译了自己的OBJ文件,这些文件由
RegularExpressionsAPI
链接。
I have reported this bug as QC 92497 我已将此错误报告为QC 92497
I have also created a separate report QC 92498 to request that TGroupCollection.GetItem
raise a more sensible exception when requesting a named group that does not exist. 我还创建了一个单独的报告QC 92498,以请求
TGroupCollection.GetItem
在请求不存在的命名组时引发更明智的异常。 (This code is in the RegularExpressions
unit which is based on code written by Vincent Parrett, not myself.) (此代码位于
RegularExpressions
单元中,该单元基于Vincent Parrett编写的代码,而不是我自己。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.