简体   繁体   中英

Extracting a substring in Java using regex. How to include space characters

I am extracting a string in JAVA using the following code: The input is (000) 000-0000111 and output should be (000) 000-0000

String text = "(000) 000-0000111";

 String pattern2 = "[(][0-9][0-9][0-9][)]\\s[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]";
 Pattern pattern_22 = Pattern.compile(pattern2);
Matcher matcher_22 = pattern.matcher(text);

if(matcher_22.find()) {

    return matcher_22.group(0);
}
else
    return "NONE";

The output is coming as "NONE", it should be (000) 000-0000. Where am I going wrong. Can please someone point it out. TIA.

From your comment :

No my goal is : if input is (xxx) xxx-xxxxyyy then output should be (xxx) xxx-xxxx x= any integer

If you want to replace the last three digits you can use :

phone = phone.replaceAll("\\d{3}$", "");

regex demo


I don't want to replace it i want to extract it. There is a huge difference !

In this case good tool is Pattern with Matcher, your regex can be \\(\\d{3}\\)\\s\\d{3}-\\d{4} :

String text = "(000) 000-0000111";

String regex = "\\(\\d{3}\\)\\s\\d{3}-\\d{4}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
    return matcher.group();
} else {
    return "NONE";
}

只需将您的模式字符串替换为:

[^1-9]

Your question doesn't compile. You need to modify it as follows

String text = "(000) 000-0000111";

String pattern2 = "[(][0-9][0-9][0-9][)]\\s[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]";
Pattern pattern_22 = Pattern.compile(pattern2);
Matcher matcher_22 = pattern_22.matcher(text);

if (matcher_22.find()) {
    return matcher_22.group(0);
} else {
    return "NONE";
}

This will output (000) 000-0000 as you request in your question.

That said though, your current regex is quite verbose. You can use something much simpler, such as

\(\d{3}\) \d{3}\-\d{4}

Noting that in java you have to escape backslashes, so within the context of your java code it will be

\\(\\d{3}\\) \\d{3}\\-\\d{4}

Where \\d represents a digit (0-9) and {n} represents a group of n , one after the other.

If, apart from this, you also want to know what was excluded from your input (as mentioned in a comment, you want to extract the trailing characters), you can use capture groups in your regex like for example

(\(\d{3}\) \d{3}\-\d{4})(.*)

Where the first capture group, (\\(\\d{3}\\) \\d{3}\\-\\d{4}) will give you the pattern you want as output, and the second, (.*) will give you all the rest that came after. This also has to have escaped backslashes if used in Java.

With

pattern2 = "(\\(\\d{3}\\) \\d{3}\\-\\d{4})(.*)";

You would have

String valid = matcher_22.group(1);
String rest = matcher_22.group(2);

where valid is (000) 000-0000 and rest is 111 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM