[英]regular expression for matching date in Perl
我要匹配具有以下格式的日期:
2010-08-27 02:11:36
即yyyy-mm-dd hh:mm:ss
。
现在,我对实际可行的日期不是很确定,只是日期格式正确。
应匹配的可能格式为(对于此示例)
2010
2010-08
2010-08-27
2010-08-27 02
2010-08-27 02:11
2010-08-27 02:11:36
在Perl中,什么是简洁的正则表达式?
到目前为止,我有这个(有效,顺便说一句)
/\d{4}(-\d{2}(-\d{2}( \d{2}(:\d{2}(:\d{2})?)?)?)?)?/
可以在性能方面进行改进吗?
基于一年中缺少捕获者,我认为您只关心日期是否匹配。
我尝试了几种与您提出的问题有关的模式,而使您的问题改善了10%至15%的一种模式是禁用捕获功能, 即 ,
/\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/
(?:图案)
(?imsx-imsx:pattern)
这是为了群集,而不是捕获; 喜欢它的子表达式组
()
但不会作为反向引用()
一样。 所以@fields = split(/\\b(?:a|b|c)\\b/)
就好像
@fields = split(/\\b(a|b|c)\\b/)
但不会吐出多余的字段。 如果不需要的话,不捕获字符也更便宜。
之间有字母
?
和:
与(?imsx-imsx)
一样用作标志修饰符。 例如,/(?si:more.*than).*million/i
相当于更冗长
/(?:(?si)more.*than).*million/i
基准输出:
Rate U U/NC CH/NC/A CH/NC/A/U CH CH/NC null U 31811/s -- -32% -58% -59% -61% -66% -93% U/NC 46849/s 47% -- -38% -39% -42% -50% -90% CH/NC/A 76119/s 139% 62% -- -1% -6% -18% -84% CH/NC/A/U 76663/s 141% 64% 1% -- -6% -17% -84% CH 81147/s 155% 73% 7% 6% -- -13% -83% CH/NC 92789/s 192% 98% 22% 21% 14% -- -81% null 481882/s 1415% 929% 533% 529% 494% 419% --
码:
#! /usr/bin/perl
use warnings;
use strict;
use Benchmark qw/ :all /;
sub option_chain {
local($_) = @_;
/\d{4}(-\d{2}(-\d{2}( \d{2}(:\d{2}(:\d{2})?)?)?)?)?/
}
sub option_chain_nocap {
local($_) = @_;
/\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/
}
sub option_chain_nocap_anchored {
local($_) = @_;
/\A\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/
}
sub option_chain_anchored_unrolled {
local($_) = @_;
/\A\d\d\d\d(-\d\d(-\d\d( \d\d(:\d\d(:\d\d)?)?)?)?)?\z/
}
sub simple_split {
local($_) = @_;
split /[ :-]/;
}
sub unrolled {
local($_) = @_;
grep defined($_), /\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d):(\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) \z
|\A (\d\d\d\d)-(\d\d) \z
|\A (\d\d\d\d) \z
/x;
}
sub unrolled_nocap {
local($_) = @_;
grep defined($_), /\A \d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d \z
|\A \d\d\d\d-\d\d-\d\d \d\d:\d\d \z
|\A \d\d\d\d-\d\d-\d\d \d\d \z
|\A \d\d\d\d-\d\d-\d\d \z
|\A \d\d\d\d-\d\d \z
|\A \d\d\d\d \z
/x;
}
sub id { $_[0] }
my @examples = (
"xyz",
"2010",
"2010-08",
"2010-08-27",
"2010-08-27 02",
"2010-08-27 02:11",
"2010-08-27 02:11:36",
);
cmpthese -1 => {
"CH" => sub { option_chain $_ for @examples },
"CH/NC" => sub { option_chain_nocap $_ for @examples },
"CH/NC/A" => sub { option_chain_nocap_anchored $_ for @examples },
"CH/NC/A/U" => sub { option_chain_anchored_unrolled $_ for @examples },
"U" => sub { unrolled $_ for @examples },
"U/NC" => sub { unrolled_nocap $_ for @examples },
"null" => sub { id $_ for @examples },
};
除了缺少锚点之外,您的正则表达式还不错(除非您想在“ abc200890”中匹配2008?)。 假设您要匹配整个字符串:
/^\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/
如果您实际上并不想要捕获的子字符串,则应使用(?:...)
,我猜可能是这种情况。
我会使用split函数:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @dates = (
'2010',
'2010-08',
'2010-08-27',
'2010-08-27 02',
'2010-08-27 02:11',
'2010-08-27 02:11:36',
);
for (@dates) {
my @list = split /[ :-]/;
print Dumper(\@list);
}
输出:
$VAR1 = [
'2010'
];
$VAR1 = [
'2010',
'08'
];
$VAR1 = [
'2010',
'08',
'27'
];
$VAR1 = [
'2010',
'08',
'27',
'02'
];
$VAR1 = [
'2010',
'08',
'27',
'02',
'11'
];
$VAR1 = [
'2010',
'08',
'27',
'02',
'11',
'36'
];
这与以上所有内容都匹配(也与其他内容匹配-请参见注释!),并且可能更容易阅读:
/(\d{4})(-\d{2})?(\w{1}\d{2})?(:\d{2})?/
如果您想更快,那么请远离正则表达式,再看一下XS模块: Date :: Calc是一个很好的模块。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.