I've looked for answers to this question on stackoverflow and google, couldn't really find what I was looking for.
When I want to retrieve data from a page, like this one, with this code
public class ConsoleSearch {
public static void main(String[] args) throws IOException {
URL url = new URL("http://www.stackoverflow.com");
URLConnection cnt = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader
(cnt.getInputStream()));
String content;
while((content = br.readLine()) != null){
System.out.println(content);
}
br.close();
}
}
I obviously get the HTML tags, and everything else that comes with it. I can easily filter HTML using HtmlCleaner
The challenging part and where I find my self stuck is when I want to retrieve specific text from all the retrieved data.
For example, if I wanted to only retrieve text "Nova Scotia" and/or "Europe"... how would I do that?
Pattern p = Pattern.compile("Nova Scotia");
Matcher m = p.matcher(content);
boolean b = m.matches();
Just look into the above regex package and it will be helpful to you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.