I'm currently building a java programm to automate weekly recurring sports class bookings, rather than to manually book them.
To achieve this I load the list of classes for the specific date via a http get and want to parse the needed class id (foo/bar/ class-id ) from the response.
A shortened response looks like this:
<div>
<div class="row">
Olympic Weightlifting <br>
<a data-url="foo/bar/2099159">
Book
</a>
</div>
<div class="row">
Fitness <br>
<a data-url="foo/bar/2098939">
Book
</a>
</div>
</div>
So far the two regex in the snippet below are the closest I could get, but they both will match the last/second class id instead of the first one following the word "Weightlifting".
String str = "<div>\n" +
"\t<div class=\"row\">\n" +
"\t\t\tOlympic Weightlifting <br>\n" +
"\n" +
"\t\t\t<a data-url=\"foo/bar/2099159\">\n" +
"\t\t\t\tBook\n" +
"\t\t\t</a>\n" +
"\t</div>\n" +
"\t<div class=\"row\">\n" +
"\t\t\tFitness <br>\n" +
"\n" +
"\t\t\t<a data-url=\"foo/bar/2098939\">\n" +
"\t\t\t\tBook\n" +
"\t\t\t</a>\n" +
"\t</div>\n" +
"</div>";
// regex 1: pattern multiline
Pattern p = Pattern.compile("Weightlifting.*foo/bar/(.*?)\"", Pattern.DOTALL);
// regex 2: inline multiline
// Pattern p = Pattern.compile("Weightlifting[\\s\\S]*foo/bar/(.*?)\"");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1).trim());
}
well your regex is greedy you need to make it lazy.
"Weightlifting.*?foo/bar/(.*?)\""
|
^ change this part
One more pattern you can use is this
(?<=data-url=")[^\/]+\/[^\/]+\/(\d+)
(?<=data-url\\s*=\\s*")
- positive lookbehind. checks for data-url=
[^\\/]+\\/[^\\/]+\\/
- matches text upto two /
. (\\d+)
- matches digits one or more time ( the id you want to capture )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.