How can I get specific text from a webpage

Question

I've looked for answers to this question on stackoverflow and google, couldn't really find what I was looking for.

When I want to retrieve data from a page, like this one, with this code

public class ConsoleSearch {

    public static void main(String[] args) throws IOException {

        URL url = new URL("http://www.stackoverflow.com");
        URLConnection cnt = url.openConnection();
        BufferedReader br = new BufferedReader(new InputStreamReader
(cnt.getInputStream()));
        String content;

        while((content = br.readLine()) != null){   
            System.out.println(content);
        }
        br.close();
    }

}

I obviously get the HTML tags, and everything else that comes with it. I can easily filter HTML using HtmlCleaner The challenging part and where I find my self stuck is when I want to retrieve specific text from all the retrieved data.

For example, if I wanted to only retrieve text "Nova Scotia" and/or "Europe"... how would I do that?

Answer 1

Pattern p = Pattern.compile("Nova Scotia"); 
    Matcher m = p.matcher(content);
    boolean b = m.matches();

Just look into the above regex package and it will be helpful to you.

How can I get specific text from a webpage

Question

1 answers

solution1
2 ACCPTED 2013-09-23 08:39:41

How can I get specific text from a webpage

Question

1 answers

solution1 2 ACCPTED 2013-09-23 08:39:41

solution1
2 ACCPTED 2013-09-23 08:39:41