简体   繁体   中英

Using Regular Expressions to Extract specific Values in Java

I have several strings in the rough form:

String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";

I want to extract the values for websiteName, userAgentNameWithSpaces, username and someTime. I have tried the following code.

private static final Pattern USER_NAME_PATTERN = Pattern.compile("for user.*;");
final Matcher matcher = USER_NAME_PATTERN.matcher(line); 
matcher.find() ? Optional.of(matcher.group(group)) : Optional.empty();

It returns the whole string " for user username" after that I have to replace the for user string with empty string to get the user name. However, I want to know if there is regex to just get the username directly?

I think you want to use lookaheads and lookbehinds:

String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
Pattern USER_NAME_PATTERN = Pattern.compile("(?<=for user).*?(?=;)");
final Matcher matcher = USER_NAME_PATTERN.matcher(s);
matcher.find();
System.out.println(matcher.group(0).trim());

Output:

username

You can use regex groups:

Pattern pattern = Pattern.compile("for user (\\w+)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
}

The pair of parenthesis ( and ) forms a group that can be obtained by the matcher using group method (as it's the first parenthesis, it's group 1).

\\w means a "word character" (letters, numbers and _ ) and + means "one or more ocurrences". So \\w+ means basically "a word" (assuming your username has only these characters). PS: note that I had to escape \\ , so the resulting expression is \\\\w+ .

The ouput of this code is:

username


If you want to match all the values (websiteName, userAgentNameWithSpaces and so on), you could do the following:

Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent (.*) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
    System.out.println(matcher.group(4));
}

The output will be:

websiteNAme
userAgentNameWithSpaces
username
someTime

Note that if userAgentNameWithSpaces contains spaces, \\w+ won't work (because \\w doesn't match spaces), so .* will work in this case.


But you can also use [\\w ]+ - the brackes [] means "any of the characters inside me", so [\\w ] means "a word character, or a space" (note that there's a space between w and ] . So the code would be (testing with a username with spaces):

String s = "Rendering content from websiteNAme using user agent userAgent Name WithSpaces ; for user username ; at time someTime";
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent ([\\w ]+) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
    System.out.println(matcher.group(4));
}

And the output will be:

websiteNAme
userAgent Name WithSpaces
username
someTime

Note: you can test if the groups were matched before calling matcher.group(n) . The method matcher.groupCount() returns how many groups were matched (because if you call matcher.group(n) and group n is not available, you'll get an IndexOutOfBoundsException )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM