Html parsing with JSoup

Question

I am trying to parse the html of the following URL:

http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/

to obtain the text of the "< p >" tag which contains the name of an instructor. The required information is located inside "< p >" tags but I am unable to retrieve the tags using JSoup. I have no idea what I am doing wrong because when I save the tag in an Element object lets call it 'b' and I call b.getAllElements() it doesn't show

as one of the elements. Isn't that what the getAllElements() method of Jsoup does? If not could someone please explain to me the hierarchy that I am obviously missing as the parser is not able to locate the

tag which contains the text that I require which in this case is "Prof. Zoltan Spakovszky".

Any help would be greatly appreciated.

public void getHomePageLinks()
{
    String html = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/";
    org.jsoup.nodes.Document doc = Jsoup.parse(html);

    Elements bodies = doc.select("body");

    for(Element body : bodies )
    {
        System.out.println(body.getAllElements());
    }

}

the output is:

http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/

isn't it supposed to print out all the elements within the body tag in the document?

Answer 1

我对JSoup一无所知，但是似乎如果您想使用讲师的姓名，则可以使用以下方法进行访问：

Element instructor = doc.select("div.chpstaff div p");

Answer 2

may be u already solved but i worked on it so cant resist to submit

import java.io.IOException;
import java.util.logging.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JavaApplication17 {

public static void main(String[] args) {

try {
   String url = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-   fall-2002/";
  Document doc = Jsoup.connect(url).get();
  Elements paragraphs = doc.select("p");
  for(Element p : paragraphs)
    System.out.println(p.text());

} 
catch (IOException ex) {
  Logger.getLogger(JavaApplication17.class.getName())
        .log(Level.SEVERE, null, ex);
   }
  }
}

is it what u meant?

Answer 3

Here's a short example:

// Connect to the website and parse it into a document
Document doc = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();

// Select all elements you need (se below for documentation)
Elements elements = doc.select("div[class=chpstaff] p");

// Get the text of the first element
String instructor = elements.first().text();

// eg. print the result
System.out.println(instructor);

Take a look at the documentation of the jsoup selector api here: Jsoup Codebook
Its not very difficult to use but very powerful.

Answer 4

Here is a code

Document document = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();

        Elements elements = document.select("p");
        System.out.println(elements.html());

You can select all tags using Selector property of Jsoup. It will return the text and tags of

.

Answer 5

        Elements ele=doc.select("p");
      ' String text=ele.text();
        System.out.println(text);

Try this I think it will work

Html parsing with JSoup

Question

5 answers

solution1
3 2012-09-11 02:41:15

solution2
3 2013-06-19 07:38:24

solution3
2 2012-09-11 12:34:52

solution4
1 2012-09-14 12:16:35

solution5
0 2016-03-01 07:08:04

Html parsing with JSoup

Question

5 answers

solution1 3 2012-09-11 02:41:15

solution2 3 2013-06-19 07:38:24

solution3 2 2012-09-11 12:34:52

solution4 1 2012-09-14 12:16:35

solution5 0 2016-03-01 07:08:04

solution1
3 2012-09-11 02:41:15

solution2
3 2013-06-19 07:38:24

solution3
2 2012-09-11 12:34:52

solution4
1 2012-09-14 12:16:35

solution5
0 2016-03-01 07:08:04