简体   繁体   中英

How can I extract all substring by matching a regular expression?

I want extract values of all src attribute in this string, how can i do that:

<p>Test&nbsp;
<img alt="70" width="70" height="50" src="/adminpanel/userfiles/image/1.jpg" />
Test 
<img alt="70" width="70" height="50" src="/adminpanel/userfiles/image/2.jpg" />
</p>

Here you go:

String data = "<p>Test&nbsp;\n" +
    "<img alt=\"70\" width=\"70\" height=\"50\" src=\"/adminpanel/userfiles/image/1.jpg\" />\n" +
    "Test \n" +
    "<img alt=\"70\" width=\"70\" height=\"50\" src=\"/adminpanel/userfiles/image/2.jpg\" />\n" +
    "</p>";
Pattern p0 = Pattern.compile("src=\"([^\"]+)\"");
Matcher m = p0.matcher(data);
while (m.find())
{
  System.out.printf("found: %s%n", m.group(1));
}

Most regex flavors have a shorthand for grabbing all matches, like Ruby's scan method or .NET's Matches() , but in Java you always have to spell it out.

Idea - split around the '"' char, look at each part if it contains the attribute name src and - if yes - store the next value, which is a src attribute.

String[] parts = thisString.split("\"");  // splits at " char
List<String> srcAttributes = new ArrayList<String>();
boolean nextIsSrcAttrib = false;
for (String part:parts) {
  if (part.trim().endsWith("src=") {
    nextIsSrcAttrib = true; {
  else if (nextIsSrcAttrib) {
    srcAttributes.add(part);
    nextIsSrcAttrib = false;
  }
}

Better idea - feed it into a usual html parser and extract the values of all src attributes from all img elements. But the above should work as an easy solution, especially in non-production code.

sorry for not coding it (short of time) how about: 1. (assuming that the file size is reasonable)read the entire file to a String. 2. Split the String arround "src=\\"" (assume that the resulting array is called strArr) 3. loop over resulting array of Strings and store strArr[i].substring(0,strArr[i].indexOf("\\" />")) to some collection of image sources.

Aviad

since you've requested a regex implementation ...

import java.util.regex.Matcher; 
import java.util.regex.Pattern;

public class Test {
    private static String input = "....your html.....";

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("src=\".*\"");
        Matcher matcher = pattern.matcher(input);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }

    } 
}

You may have to tweak the regex if your src attributes are not double quoted

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM