简体   繁体   English

如何在Perl中解析电话号码?

[英]How can I parse a phone number in Perl?

I am trying to grab any digits in front of a known line number of a phone, if they exist (in Perl). 我试图抓住电话的已知线路号前面的任何数字(如果存在)(在Perl中)。 There will be no dashes, only digits. 没有破折号,只有数字。

For example, say I know the line number will always be 8675309. 8675309 may or may not have leading digits, if it does I want to capture them. 例如,假设我知道行号始终为8675309。如果我要捕获它们,则8675309可能带有或不带有前导数字。 There is not really a limit on the number of leading digits. 前导位数的数量实际上没有限制。

$input          $digits       $number
'8675309'       ''            '8675309'
'8008675309'    '800'         '8675309'
'18888675309'   '1888'        '8675309'
'18675309'       '1'           '8675309'
'86753091'      not a match

/8675309$/ this will match how to capture the pre-digits in one regex? /8675309$/这将匹配如何在一个正则表达式中捕获前置位?

Some regexes work better backwards than forwards. 一些正则表达式向后比向前更好。 So sometimes it is useful to use sexeger, rather than regexes. 因此,有时使用sexeger而不是正则表达式很有用。

my $pn = '18008675309';

reverse($pn) =~ /^9035768(\d*)/;
my $got = reverse $1;

The regex is cleaner and avoids a lot of back tracking at the cost of some fummery with reversing the input and captured values. 正则表达式更干净,并且通过反转输入和捕获的值避免了很多回溯,但这样做有些费解。

The backtracking gain is smaller in this case than it would be if you had a general phone number extraction regex: 在这种情况下,回溯增益要比使用通用电话号码提取正则表达式时的回溯增益小:

Regex:   /^(\d*)\d{7}$/
Sexeger: /^\d{7}(\d*)/

There is a whole class of problems where this technique is useful. 在这种技术中有用的一整类问题。 For more info see the sexeger post on Perlmonks . 有关更多信息,请参见Perlmonks上的sexeger帖子

my($digits,$number);
if ($input =~ /^(\d*)(8675309)$/) {
  ($digits,$number) = ($1,$2);
}

The * quantifier is greedy, but that means it matches as much as possible while still allowing a match . *量词是贪婪的,但这意味着它在允许匹配的同时尽可能地匹配 So initially, yes, \\d* tries to gobble up all the digits in $number , but it reluctantly gives up character-by-character what it's matched until the whole pattern matches successfully. 因此,最初,是的, \\d*试图吞噬$number所有数字,但是它无奈地放弃了每个字符所匹配的内容,直到整个模式成功匹配为止。

Another approach is to chop off the tail: 另一种方法是砍掉尾巴:

(my $digits = $input) =~ s/8675309$//;

You could do the same without using a regular expression: 您可以在不使用正则表达式的情况下执行相同的操作:

my $digits = $input;
substr($digits, -7) = "";

The above, at least with perl-5.10-1, could even be condensed to 至少对于perl-5.10-1来说,以上内容甚至可以简化为

substr(my $digits = $input, -7) = "";

The regex special variables $` and $& are another way of grabbing those pieces of information. 正则表达式特殊变量$`和$&是获取这些信息的另一种方法。 They hold the contents of the data preceding the match and the match itself respectively. 它们分别保存比赛之前的数据内容和比赛本身。

   if ( /8675309$/ )
      {
      printf( "%s,%s,%s\n", $_, $`, $& );
      }
   else
      {
      printf( "%s,Not a match\n", $_ );
      }

There's a Perl package that deals with at least UK and US phone numbers. 有一个Perl软件包,至少处理英国和美国的电话号码。

It's called Number::Phone and the code is somewhere on the cpan.org site. 它称为Number :: Phone,代码在cpan.org网站上的某个位置。

How about /(\\d)?(8675309)/ ? /(\\d)?(8675309)/ UPDATE: 更新:

whoops that should haev been /(\\d*)(8675309)/ 应该是/(\\d*)(8675309)/

I might not understand the problem. 我可能不明白这个问题。 Why is there a difference between the first and fourth examples: 为什么第一个示例和第四个示例之间存在差异:

 '8675309' '' '8675309' ... '8675309' '1' '8675309' 

If all you want is to separate the last seven digits from everything else, you could have said it that way rather than provide confusing examples. 如果您只想将后7位数字与其他所有数字分开,则可以这样说,而不必提供令人困惑的示例。 A regex for that would be: 正则表达式为:

/(\\d*)(\\d{7,7})$/ /(\\ d *)(\\ d {7,7})$ /

If you weren't just providing a hypothetical number, and really are only looking for lines with '8675309' (seems strange), replace the '\\d{7,7}' with '8675309'. 如果您不只是提供一个假设的数字,而实际上只是在寻找带有'8675309'的行(似乎很奇怪),请将'\\ d {7,7}'替换为'8675309'。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM