[英]Regex not matching data and dates
我有一个SQL Select转储,其中许多行看起来像这样:
07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,
我想对每一行做两件事:
sysdate
函数替换所有日期。 日期也可以没有时间(如07/11/2011
)。 null
字符串替换所有空值 这是我的尝试:
$_ =~ s/,(,|\n)/,null$1/g; # Replace no data by "null"
$_ =~ s/\d{2}\/\d{2}\/d{4}.*?,/sysdate,/g; # Replace dates by "sysdate"
但这会将字符串转换为:
07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',null,,null,'text',null,,0,0,null
虽然我希望是
sysdate,sysdate,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null
我不明白为什么日期不匹配,为什么有些,,
不会被null
替换。
欢迎有任何见解,谢谢。
你可以这样做:
$ cat perlregex.pl
use warnings;
use strict;
my $row = "07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,\n";
print( "$row\n" );
while ( $row =~ /,([,\n])/ ) { $row =~ s/,([,\n])/,null$1/; }
print( "$row\n" );
$row =~ s/\d{2}\/\d{2}\/\d{4}.*?,/sysdate,/g;
print( "$row\n" );
结果如下:
$ ./perlregex.pl
07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,
07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null
sysdate,sysdate,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null
这肯定可以优化,但它得到了重点。
\\d{2}\\/\\d{2}\\/d{4}.*?,
不起作用,因为最后一个d
没有被转义。
如果a可以位于字符串的任一侧,
也可以位于字符串的开头/结尾,则可以分两个步骤进行操作:
第1步
s/(?:^|(?<=,))(?=,|\\n)/null/g
扩展:
/
(?: ^ # Begining of line, ie: nothing behind us
| (?<=,) # Or, a comma behind us
)
# we are HERE!, this is the place between characters
(?= , # A comma in front of us
| \n # Or, a newline in front of us
)
/null/g
# The above regex does not consume, it just inserts 'null', leaving the
# same search position (after the insertion, but before the comma).
# If you want to consume a comma, it would be done this way:
s/(?:^|(?<=,))(,|\n)/null$1/xg
# Now the search position is after the 'null,'
第2步
s/(?:^|(?<=,))\\d{2}\\/\\d{2}\\/\\d{4}.*?(?=,|\\n)/sysdate/g
或者,您可以使用eval修饰符将它们组合为一个正则表达式:
$row =~ s/(?:^|(?<=,))(\\d{2}\\/\\d{2}\\/\\d{4}.*?|)(?=,|\\n)/ length $1 ? 'sysdate' : 'null'/eg;
分解它看起来像这样
s{
(?: ^ | (?<=,) ) # begin of line or comma behind us
( # Capt group $1
\d{2}/\d{2}/\d{4}.*? # date format and optional non-newline chars
| # Or, nothing at all
) # End Capt group 1
(?= , | \n ) # comma or newline in front of us
}{
length $1 ? 'sysdate' : 'null'
}eg
如果有可能出现非换行空格填充,则可以写成:
$row =~ s/(?:^|(?<=,))(?:([^\\S\\n]*\\d{2}\\/\\d{2}\\/\\d{4}.*?)|[^\\S\\n]*)(?=,|\\n)/ defined $1 ? 'sysdate' : 'null'/eg;
你想要替换一些东西。 通常预测是一个更好的选择:
$subject =~ s/(?<=,)(?=,|$)/null/g;
说明:
"
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
, # Match the character “,” literally
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
, # Match the character “,” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
您希望替换日期:
$subject =~ s!\d{2}/\d{2}/\d{4}.*?(?=,)!sysdate!g;
这与你的原始正则表达式几乎相同。 只需将last替换为lookahead。 (如果您不想替换它,请不要匹配它。)
# \d{2}/\d{2}/\d{4}.*?(?=,)
#
# Match a single digit 0..9 «\d{2}»
# Exactly 2 times «{2}»
# Match the character “/” literally «/»
# Match a single digit 0..9 «\d{2}»
# Exactly 2 times «{2}»
# Match the character “/” literally «/»
# Match a single digit 0..9 «\d{4}»
# Exactly 4 times «{4}»
# Match any single character that is not a line break character «.*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=,)»
# Match the character “,” literally «,»
也许 。*? 太贪心了,试试:
$_ =~ s/\d{2}\/\d{2}\/d{4}[^,]+,/sysdate,/g;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.