简体   繁体   中英

Extract addresses and names from a listing in an HTML page

Okay, so I have printed a pages HTML to string, and I am wanting to grab a certain String from it. The thing is it is different every time I load the page. Example:

 Blah blah blah 1. name address phone number 2. name address phone number blah blah 

There could be anything from 1 to 10 listings.
All I'm interested in is grabbing address and name .

I did try:

 public static String removeNonDigits(final String str) {
      if (str == null || str.length() == 0) {
           return "";
      }
       return str.replaceAll("\\D+", "");
 }

But with no Avail.

Might have to adjust a little I don't know where the whitespaces are exactly:

Pattern pattern = Pattern.compile("\\n *(?:[1-9]|10)\\. +(.+?) *\\n *(.+?) *\\n");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
  System.out.println("name: " + matcher.group(1));
  System.out.println("address: " + matcher.group(2));
  System.out.println(matcher.group()); // the whole match
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM