简体   繁体   中英

How to get text from nested span using Jsoup?

I'm trying to get the text in the span

在此处输入图片说明

using this code below. However the output is behaving as if the nested spans don't exist

            Elements tags = document.select("div[id=tags]"); 

            for (Element tag:tags){


                Elements child_tags = tag.getElementsByTag("class");  

                String key = tag.html();
                System.out.println(key); //only as a test

                for (Element child_tag:child_tags){
                    System.out.println("\t" + child_tag.text());

                }

My output is

      <hr />Tags: 
      <span id="category"></span> 
      <span id="voteSelector" class="initially_hidden"> <br /> </span>      
Elements child_tags = tag.getElementsByTag("class");

With this line you are trying to get an element with tag class ie <class>...</class> , which dose not exist. Change that line to:

Elements child_tags = tag.getElementsByClass("tag");

to get elements by attribute value of class = tag or to:

Elements child_tags = tag.getElementsByTag("span"); 

to get elements by tag name = span.

Assuming you are trying the code on https://chesstempo.com/chess-problems/15 and the data you want is shown in the below image 在此处输入图片说明

Now, Using Jsoup you will get the data whatever is rendered as a source code in the browser,for confirmation you can press CTRL+U in browser which will open up a new window where the actual contents which Jsoup will get will be displayed. Now coming to your questions the part which you are trying to retrieve itself is not present in the browser source code check that by pressing CTRL+U .

If the contents are rendered using JAVASCRIPT those will not be visible to JSOUP and hence you have to use something else which will run the javascript and provide you the details.

JSoup does not run Javascript and is not a browser.

EDIT

There is a turnaround using SELENIUM . Below is the working code to get the exact source code of the url and the required data which you are looking for:

import java.io.IOException;
import java.io.PrintWriter;

import org.json.simple.parser.ParseException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class JsoupDummy {
 public static void main(String[] args) throws IOException, ParseException {
    System.setProperty("webdriver.gecko.driver", "D:\\thirdPartyApis\\geckodriver-v0.19.1-win32\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();

    try {
        driver.get("https://chesstempo.com/chess-problems/15");
        Document doc = Jsoup.parse(driver.getPageSource());
        Elements elements = doc.select("span.ct-active-tag");
        for (Element element:elements){
             System.out.println(element.html());
        }

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        /*write.flush();
        write.close();*/
        driver.quit();

    }
}
}

You need selenium web driver Selenium Web Driver which simulates the browser behaviour and allows you to render the html content written by scripts as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM