[英]Matching a phone number at the end of a string with `regex`, and return both parts
I have a bunch of lines like the following: 我有几行如下:
Name1 Surname1 +44 (020) 1234 5678
Name2 Name2 Surname2 +39 (051) 12.34.56
Surname3, Name3 - (555) 123-456-789
Surname4, Name4 Name4 123 - 456.78.90
and I would like to identify and return the names and the numbers that they contain. 我想识别并返回其中包含的名称和数字。 For instance, I would like to return:
例如,我想返回:
Name1 Surname1 +44 (020) 1234 5678
Name1 Surname1
Name1 Surname1
姓氏Name1 Surname1
+44 (020) 1234 5678
+44 (020) 1234 5678
Name2 Name2 Surname2 +39 (051) 12.34.56
Name2 Name2 Surname2
Name2 Name2 Surname2
+39 (051) 12.34.56
+39 (051) 12.34.56
Surname3, Name3 - (555) 123-456-789
Surname3, Name3 -
Surname3, Name3 -
(555) 123-456-789
(555) 123-456-789
Surname4, Name4 Name4 123 - 456.78.90
Surname4, Name4 Name4
Surname4, Name4 Name4
123 - 456.78.90
123 - 456.78.90
I'm using Java regex
and, so far, I came up to the following pattern: 我正在使用Java
regex
,到目前为止,我想到了以下模式:
\A(.*)\s+(\+?\s*\d+([.-\s]*(\d+|\(\d+\)))+)\z
If line
is any of above lines, the code to match the pattern is: 如果
line
是以上任何一行,则与该模式匹配的代码为:
Pattern pattern = Pattern.compile("^(.*)\\s+(\\+?\\s*\\d+([.-\\s]*(\\d+|\\(\\d+\\)))+)$");
Matcher matcher = pattern.match(line);
if (matcher.find()) {
System.out.println("Name: " + pattern.group(1));
System.out.println("Number: " + pattern.group(2));
}
Unfortunately, on any line
( Name1 Surname1 +44 (020) 1234 5678
, for instance) it returns the following: 不幸的是,在任何
line
(例如, Name1 Surname1 +44 (020) 1234 5678
)它都会返回以下内容:
Name: Name1 Surname1 +44 (020) 1234
Number: 5678
I think that the reason for this result is the regex
being too greedy, but I don't understand how to modify its behaviour. 我认为导致此结果的原因是
regex
过于贪婪,但我不知道如何修改其行为。
Can anyone please correct the pattern and explain me the solution in simple terms? 任何人都可以请更正此模式并以简单的方式向我解释解决方案吗? I read a few tutorial without understanding what to do.
我看了一些教程,却不知道该怎么做。 Thanks in advance!
提前致谢!
The simplest I can think of right now would be 我现在想到的最简单的方法是
^(.*?)\s*((?:\+|\()[-\d(). ]*)
It captures everything up to the spaces preceding a +
or a (
. Then it captures everything after that (being digits, hyphens, parentheses, dots or spaces) to a second group. 它捕获到
+
或(
之前的空格之前的所有内容。然后捕获到第二组之后的所有内容(数字,连字符,括号,点或空格)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.