简体   繁体   English

我需要帮助使正则表达式正确表达

[英]I need help getting Regex expression correct

I am trying to get a regular expression to find multiple entries of my pattern on a line. 我试图获取一个正则表达式以在一行上找到我的模式的多个条目。 Note: I've been using Regex for about an hour... =/ 注意:我已经使用Regex大约一个小时了... = /

For example: 例如:

<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>

Should match twice: 应该匹配两次:

1) <a href="G2532" id="1">back</a>
2) <a href="G2564" id="2">next</a>

I think the answer lies in the proper mastery of greedy vs reluctant vs possessive but I can't seem to get it to work... 我认为答案在于对贪婪,勉强和占有欲的正确掌握,但我似乎无法使它发挥作用。

I think I am close, the Regex string I have created so far is: 我想我很接近,到目前为止我创建的Regex字符串是:

(<a href=").*(" id="1">).*(</a>)

But the Regex matcher returns 1 match, the entire string... 但是正则表达式匹配器返回1个匹配项,整个字符串...

I have a (compilable) Java Regex Test Harness in code below. 我在下面的代码中有一个(可编译)Java Regex测试工具。 Here's my recent (futile) attempts to get this using that program, the output should be pretty intuitive. 这是我最近(徒劳的)尝试使用该程序获得此输出的输出,应该非常直观。

Enter your regex: (<a href=").*(" id="1">).*(</a>)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)+
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: ((<a href=").*(" id="1">).*(</a>))?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
I found the text "" starting at index 63 and ending at index 63.

Enter your regex: ((<a href=").*(" id="1">).*(</a>))+?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Enter your regex: (((<a href=").*(" id="1">).*(</a>))+?)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.

Here's the Java: 这是Java:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexTestHarness {

    public static void main(String[] args){
        try{
            while (true) {

                System.out.print("\nEnter your regex: ");

                BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
                Pattern pattern = Pattern.compile(reader.readLine());

                System.out.print("Enter input string to search: ");
                Matcher matcher = pattern.matcher(reader.readLine());

                boolean found = false;
                while (matcher.find()) {
                    System.out.println("I found the text \"" + matcher.group() + "\" starting at " +
                       "index " + matcher.start() + " and ending at index " + matcher.end() + ".");
                    found = true;
                }
                if(!found){
                    System.out.println("No match found.");
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(-1);
        }

    }
}

Try this: 尝试这个:

<a href=".*?" id="1">.*?</a>

I've converted the captures to non-greedy by adding a ? 我通过添加?将捕获的内容转换为非贪婪的内容? after .* .*

But when in doubt, you can use this trick: 但是,如果有疑问,可以使用以下技巧:

<a href="[^"]*" id="1">[^<]*</a>

[^"]* means any number of characters that aren't a double quote [^"]*表示任意数量的不是双引号的字符
[^<]* means any number of characters that aren't a left angle [^<]*表示任意数量的非左角字符

So you avoid worrying about greedy/non-greedy 这样您就不必担心贪婪/不贪婪

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM