So I'm trying to scrape a couple pieces of html (see below). The html has a repeating div (here: class data). From this class I'm trying to scrape the name, stat1 and stat 2. So I start with: getElementsByClass. But how do I proceed from here? how do I get the 3 elements separately?
This is what I got so far, but I just take all the text, not the 3 pieces separately:
html.html
<html>
<div class='data'>
<a href='/url1'>
<div class='name'>name1</div>
<div class='stat'>123</div>
<div class='stat2'>456</div>
</a>
</div>
<div class='data'>
<a href='/url2'>
<div class='name'>name2</div>
<div class='stat'>123.1</div>
<div class='stat2'>456.2</div>
</a>
</div>
</html>
JsoupTesting.java
package JsoupTest;
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupTesting {
public static void main(String[] args) throws IOException {
File input = new File("html.html"); //path to html.html
Document doc = Jsoup.parse(input, "UTF-8");
Elements contents = doc.getElementsByClass("data");
for (Element content : contents) {
String text = content.text();
System.out.println("name: " + text + "\n----");
}
}
}
Result:
name: name1 123 456
----
name: name2 123.1 456.2
----
I would like something like:
name: name1
stat: 123
stat2: 456
----
name: name2
stat: 123.1
stat2: 456.2
----
Thanks to BackSlash comment I got it to work, not very hard he just told me what to do :)
package JsoupTest;
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupTesting {
public static void main(String[] args) throws IOException {
File input = new File("html.html"); //path to html.html
Document doc = Jsoup.parse(input, "UTF-8");
Elements contents = doc.getElementsByClass("data");
for (Element content : contents) {
String name = content.getElementsByClass("name").first().html();
String stat = content.getElementsByClass("stat").first().html();
String stat2 = content.getElementsByClass("stat2").first().html();
System.out.println("name: " + name);
System.out.println("stat: " + stat);
System.out.println("stat2: " + stat2 + "\n----");
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.