简体   繁体   中英

How to extract image url from a string?

I am trying to extract image url from inside of a string. I am using Pattern and Matcher. I am using a regular expression to match the same. Whenever I am trying to debug the code, both, matcher.matches() and matcher.find() result into false. I am attaching the image url and regular expression as well as my code.

Pattern pattern_name;
Matcher matcher_name;

String regex = "(http(s?):/)(/[^/]+)+\" + \"\\.(?:jpg|gif|png)";
String url = "http://www.medivision360.com/pharma/pages/articleImg/thumbnail/thumb3756d839adc5da3.jpg";

pattern_name = Pattern.compile(regex);
matcher_name = pattern_name.matcher(url);

matcher_name.matches();
matcher_name.find();

You've escaped the double quotes in the string catenation
so the regex engine sees this (http(s?):/)(/[^/]+)+" + "\\.(?:jpg|gif|png)
after c++ parses the string.

You can un-escape it "(http(s?):/)(/[^/]+)+" + "\\\\.(?:jpg|gif|png)"
or just join them together "(http(s?):/)(/[^/]+)+\\\\.(?:jpg|gif|png)"

如果表达总是在最后,我会建议:

([^/?]+)(?=/?(?:$|\?))

You seem to have some issue with the regex, the \\" + \\" should come from some code you mistook for a regex. That subpattern requires a quote, one or more spaces, then a space, and another double quote to appear right before the extension. It matches something like http://www.medivision360.com/pharma/pages/articleImg/thumbnail/thumb3756d839adc5da3" ".jpg .

Also, there are two redundant capture groups at the beginning, you do not need to use them.

Use

String regex = "https?:/(?:/[^/]+)+\\.(?:jpg|gif|png)";

See this demo

Java demo :

String rx = "https?:/(?:/[^/]+)+\\.(?:jpg|gif|png)";
String url = "http://www.medivision360.com/pharma/pages/articleImg/thumbnail/thumb3756d839adc5da3.jpg";
Pattern pat = Pattern.compile(rx);
Matcher matcher = pat.matcher(url);
if (matcher.matches()) {
    System.out.println(matcher.group());
}

Note that Matcher#matches() requires a full string match, while Matcher#find() will find a partial match, a match inside a larger string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM