简体   繁体   中英

Java - Match first string via multiline regex

I'm currently building a java programm to automate weekly recurring sports class bookings, rather than to manually book them.

To achieve this I load the list of classes for the specific date via a http get and want to parse the needed class id (foo/bar/ class-id ) from the response.

A shortened response looks like this:

<div>
    <div class="row">
            Olympic Weightlifting <br>

            <a data-url="foo/bar/2099159">
                Book
            </a>
    </div>
    <div class="row">
            Fitness <br>

            <a data-url="foo/bar/2098939">
                Book
            </a>
    </div>
</div>

So far the two regex in the snippet below are the closest I could get, but they both will match the last/second class id instead of the first one following the word "Weightlifting".

    String str = "<div>\n" +
            "\t<div class=\"row\">\n" +
            "\t\t\tOlympic Weightlifting <br>\n" +
            "\n" +
            "\t\t\t<a data-url=\"foo/bar/2099159\">\n" +
            "\t\t\t\tBook\n" +
            "\t\t\t</a>\n" +
            "\t</div>\n" +
            "\t<div class=\"row\">\n" +
            "\t\t\tFitness <br>\n" +
            "\n" +
            "\t\t\t<a data-url=\"foo/bar/2098939\">\n" +
            "\t\t\t\tBook\n" +
            "\t\t\t</a>\n" +
            "\t</div>\n" +
            "</div>";


    // regex 1: pattern multiline
    Pattern p = Pattern.compile("Weightlifting.*foo/bar/(.*?)\"", Pattern.DOTALL);
    // regex 2: inline multiline
    // Pattern p = Pattern.compile("Weightlifting[\\s\\S]*foo/bar/(.*?)\"");
    Matcher m = p.matcher(str);

    if (m.find()) {
        System.out.println(m.group(1).trim());
    }

well your regex is greedy you need to make it lazy.

 "Weightlifting.*?foo/bar/(.*?)\""
                 |
                 ^ change this part

One more pattern you can use is this

(?<=data-url=")[^\/]+\/[^\/]+\/(\d+)
  • (?<=data-url\\s*=\\s*") - positive lookbehind. checks for data-url=
  • [^\\/]+\\/[^\\/]+\\/ - matches text upto two / .
  • (\\d+) - matches digits one or more time ( the id you want to capture )

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM