i have this small chunk of code that will grab the html code from a website. Im interested in parsing a certain section of the code though, several times. More specifically, im making a pokedex, and would like to parse certain descriptions from say a bulbapedia page, http://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon) for example. How would I make this parser take just the description of bulbasaur? How would I create any boundary to stop and start?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class WebCrawler{
public static void main(String[] args) {
try {
URL google = new URL("http://pokemondb.net/pokedex/bulbasaur");
URLConnection yc = google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
You can use Jsoup , with this code you can get the description of Bulbasaur:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) throws IOException {
Document doc = Jsoup
.connect(
"http://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon)")
.get();
Elements newsHeadlines = doc.select("#mw-content-text p");
for (Object o : newsHeadlines) {
System.out.println(o.toString());
}
}
}
Where mw-content
is the main div.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.