简体   繁体   English

使用正则表达式提取Java中的特定值

[英]Using Regular Expressions to Extract specific Values in Java

I have several strings in the rough form: 我有几个粗略的字符串:

String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";

I want to extract the values for websiteName, userAgentNameWithSpaces, username and someTime. 我想提取websiteName,userAgentNameWithSpaces,username和someTime的值。 I have tried the following code. 我尝试了以下代码。

private static final Pattern USER_NAME_PATTERN = Pattern.compile("for user.*;");
final Matcher matcher = USER_NAME_PATTERN.matcher(line); 
matcher.find() ? Optional.of(matcher.group(group)) : Optional.empty();

It returns the whole string " for user username" after that I have to replace the for user string with empty string to get the user name. 它返回整个字符串“ for user username”,之后我必须用空字符串替换for user字符串以获取用户名。 However, I want to know if there is regex to just get the username directly? 但是,我想知道是否有正则表达式可以直接获取用户名?

I think you want to use lookaheads and lookbehinds: 我认为您想使用先行和后备:

String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
Pattern USER_NAME_PATTERN = Pattern.compile("(?<=for user).*?(?=;)");
final Matcher matcher = USER_NAME_PATTERN.matcher(s);
matcher.find();
System.out.println(matcher.group(0).trim());

Output: 输出:

username 用户名

You can use regex groups: 您可以使用正则表达式组:

Pattern pattern = Pattern.compile("for user (\\w+)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
}

The pair of parenthesis ( and ) forms a group that can be obtained by the matcher using group method (as it's the first parenthesis, it's group 1). 一对括号()构成一个可以由匹配者使用group方法获得的group (因为它是第一个括号,因此是group 1)。

\\w means a "word character" (letters, numbers and _ ) and + means "one or more ocurrences". \\w表示“单词字符”(字母,数字和_ ),而+表示“一次或多次出现”。 So \\w+ means basically "a word" (assuming your username has only these characters). 因此\\w+基本上表示“一个单词”(假设您的用户名只有这些字符)。 PS: note that I had to escape \\ , so the resulting expression is \\\\w+ . PS:请注意,我必须转义\\ ,所以结果表达式为\\\\w+

The ouput of this code is: 该代码的输出是:

username 用户名


If you want to match all the values (websiteName, userAgentNameWithSpaces and so on), you could do the following: 如果要匹配所有值(websiteName,userAgentNameWithSpaces等),则可以执行以下操作:

Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent (.*) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
    System.out.println(matcher.group(4));
}

The output will be: 输出将是:

websiteNAme
userAgentNameWithSpaces
username
someTime

Note that if userAgentNameWithSpaces contains spaces, \\w+ won't work (because \\w doesn't match spaces), so .* will work in this case. 请注意,如果userAgentNameWithSpaces包含空格, \\w+将不起作用(因为\\w与空格不匹配),因此在这种情况下。 .*将起作用。


But you can also use [\\w ]+ - the brackes [] means "any of the characters inside me", so [\\w ] means "a word character, or a space" (note that there's a space between w and ] . So the code would be (testing with a username with spaces): 但是,您也可以使用[\\w ]+ -括号[]表示“我体内的任何字符”,因此[\\w ]表示“单词字符或空格”(请注意w]之间有一个空格。因此,代码将是(使用带空格的用户名进行测试):

String s = "Rendering content from websiteNAme using user agent userAgent Name WithSpaces ; for user username ; at time someTime";
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent ([\\w ]+) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
    System.out.println(matcher.group(4));
}

And the output will be: 输出将是:

websiteNAme
userAgent Name WithSpaces
username
someTime

Note: you can test if the groups were matched before calling matcher.group(n) . 注意:您可以在调用matcher.group(n)之前测试组是否匹配。 The method matcher.groupCount() returns how many groups were matched (because if you call matcher.group(n) and group n is not available, you'll get an IndexOutOfBoundsException ) matcher.groupCount()方法返回匹配的组数(因为如果调用matcher.group(n)且组n不可用,则会得到IndexOutOfBoundsException

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM