[英]Using Regular Expressions to Extract specific Values in Java
I have several strings in the rough form: 我有几个粗略的字符串:
String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
I want to extract the values for websiteName, userAgentNameWithSpaces, username and someTime. 我想提取websiteName,userAgentNameWithSpaces,username和someTime的值。 I have tried the following code. 我尝试了以下代码。
private static final Pattern USER_NAME_PATTERN = Pattern.compile("for user.*;");
final Matcher matcher = USER_NAME_PATTERN.matcher(line);
matcher.find() ? Optional.of(matcher.group(group)) : Optional.empty();
It returns the whole string " for user username" after that I have to replace the for user string with empty string to get the user name. 它返回整个字符串“ for user username”,之后我必须用空字符串替换for user字符串以获取用户名。 However, I want to know if there is regex to just get the username directly? 但是,我想知道是否有正则表达式可以直接获取用户名?
I think you want to use lookaheads and lookbehinds: 我认为您想使用先行和后备:
String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
Pattern USER_NAME_PATTERN = Pattern.compile("(?<=for user).*?(?=;)");
final Matcher matcher = USER_NAME_PATTERN.matcher(s);
matcher.find();
System.out.println(matcher.group(0).trim());
Output: 输出:
username 用户名
You can use regex groups: 您可以使用正则表达式组:
Pattern pattern = Pattern.compile("for user (\\w+)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
The pair of parenthesis (
and )
forms a group that can be obtained by the matcher using group
method (as it's the first parenthesis, it's group 1). 一对括号(
和)
构成一个可以由匹配者使用group
方法获得的group
(因为它是第一个括号,因此是group 1)。
\\w
means a "word character" (letters, numbers and _
) and +
means "one or more ocurrences". \\w
表示“单词字符”(字母,数字和_
),而+
表示“一次或多次出现”。 So \\w+
means basically "a word" (assuming your username has only these characters). 因此\\w+
基本上表示“一个单词”(假设您的用户名只有这些字符)。 PS: note that I had to escape \\
, so the resulting expression is \\\\w+
. PS:请注意,我必须转义\\
,所以结果表达式为\\\\w+
。
The ouput of this code is: 该代码的输出是:
username 用户名
If you want to match all the values (websiteName, userAgentNameWithSpaces and so on), you could do the following: 如果要匹配所有值(websiteName,userAgentNameWithSpaces等),则可以执行以下操作:
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent (.*) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
The output will be: 输出将是:
websiteNAme
userAgentNameWithSpaces
username
someTime
Note that if userAgentNameWithSpaces
contains spaces, \\w+
won't work (because \\w
doesn't match spaces), so .*
will work in this case. 请注意,如果userAgentNameWithSpaces
包含空格, \\w+
将不起作用(因为\\w
与空格不匹配),因此在这种情况下。 .*
将起作用。
But you can also use [\\w ]+
- the brackes []
means "any of the characters inside me", so [\\w ]
means "a word character, or a space" (note that there's a space between w
and ]
. So the code would be (testing with a username with spaces): 但是,您也可以使用[\\w ]+
-括号[]
表示“我体内的任何字符”,因此[\\w ]
表示“单词字符或空格”(请注意w
和]
之间有一个空格。因此,代码将是(使用带空格的用户名进行测试):
String s = "Rendering content from websiteNAme using user agent userAgent Name WithSpaces ; for user username ; at time someTime";
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent ([\\w ]+) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
And the output will be: 输出将是:
websiteNAme
userAgent Name WithSpaces
username
someTime
Note: you can test if the groups were matched before calling matcher.group(n)
. 注意:您可以在调用matcher.group(n)
之前测试组是否匹配。 The method matcher.groupCount()
returns how many groups were matched (because if you call matcher.group(n)
and group n is not available, you'll get an IndexOutOfBoundsException
) matcher.groupCount()
方法返回匹配的组数(因为如果调用matcher.group(n)
且组n不可用,则会得到IndexOutOfBoundsException
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.