[英]I need help getting Regex expression correct
I am trying to get a regular expression to find multiple entries of my pattern on a line. 我试图获取一个正则表达式以在一行上找到我的模式的多个条目。 Note: I've been using Regex for about an hour... =/
注意:我已经使用Regex大约一个小时了... = /
For example: 例如:
<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
Should match twice: 应该匹配两次:
1) <a href="G2532" id="1">back</a>
2) <a href="G2564" id="2">next</a>
I think the answer lies in the proper mastery of greedy vs reluctant vs possessive but I can't seem to get it to work... 我认为答案在于对贪婪,勉强和占有欲的正确掌握,但我似乎无法使它发挥作用。
I think I am close, the Regex string I have created so far is: 我想我很接近,到目前为止我创建的Regex字符串是:
(<a href=").*(" id="1">).*(</a>)
But the Regex matcher returns 1 match, the entire string... 但是正则表达式匹配器返回1个匹配项,整个字符串...
I have a (compilable) Java Regex Test Harness in code below. 我在下面的代码中有一个(可编译)Java Regex测试工具。 Here's my recent (futile) attempts to get this using that program, the output should be pretty intuitive.
这是我最近(徒劳的)尝试使用该程序获得此输出的输出,应该非常直观。
Enter your regex: (<a href=").*(" id="1">).*(</a>)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)+
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: ((<a href=").*(" id="1">).*(</a>))?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
I found the text "" starting at index 63 and ending at index 63.
Enter your regex: ((<a href=").*(" id="1">).*(</a>))+?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (((<a href=").*(" id="1">).*(</a>))+?)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Here's the Java: 这是Java:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexTestHarness {
public static void main(String[] args){
try{
while (true) {
System.out.print("\nEnter your regex: ");
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
Pattern pattern = Pattern.compile(reader.readLine());
System.out.print("Enter input string to search: ");
Matcher matcher = pattern.matcher(reader.readLine());
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text \"" + matcher.group() + "\" starting at " +
"index " + matcher.start() + " and ending at index " + matcher.end() + ".");
found = true;
}
if(!found){
System.out.println("No match found.");
}
}
} catch (IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
}
Try this: 尝试这个:
<a href=".*?" id="1">.*?</a>
I've converted the captures to non-greedy by adding a ?
我通过添加
?
将捕获的内容转换为非贪婪的内容?
after .*
在
.*
But when in doubt, you can use this trick: 但是,如果有疑问,可以使用以下技巧:
<a href="[^"]*" id="1">[^<]*</a>
[^"]*
means any number of characters that aren't a double quote [^"]*
表示任意数量的不是双引号的字符
[^<]*
means any number of characters that aren't a left angle [^<]*
表示任意数量的非左角字符
So you avoid worrying about greedy/non-greedy 这样您就不必担心贪婪/不贪婪
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.