简体   繁体   中英

How to read the particular string value using JSoup

I have handled the code to read the entire content from html.

URL url = new URL("https://idms.sunamerica.com/v2/market/home.idms");
             Document doc = Jsoup.parse(url, 5*1000);
             TextNode tn = new TextNode(doc.body().html(), "");
                String entireText = tn.getWholeText();

Now entireText contains the below text

<tr class="evenrow" onmouseover="loadMiniChart(\'S&amp;P Midcap 400\',8318990,\'market_mini_chart\')">
       '); document.write('
       <td>
        <div align="left">
         S&amp;P Midcap 400 Index
        </div></td>'); document.write('
       <td>1254.56</td>'); document.write('
       <td><span class="negative">-2.83</span></td>'); document.write('
      </tr>');

Now i need to get the value 1254.56 by using the String "S&P Midcap 400 Index".

Is there any method to match the text?? Help is appreciated....:)

I'm afraid there is not much you can do here with Jsoup, because the text you need to extract is inside a script node, so it is not HTML which Jsoup deals with, it is JavaScript.

What can be done is

  • you could try manually extracting the text needed
  • or rendering the page in Selenium, then getting the page source and parsing it with Jsoup
  • or opening a page in you web browser, saving it to disk and then parsing

Concerning you Jsoup example, there is no need to create a TextNode from html. You get you tree in doc.body() and then navigate with Jsoup API: CSS selectors or tree API methods (children, first etc).

Can use regex for these kind of scenarios

Here is the solution for your question

String ResultString = null;
try {
    Pattern regex = Pattern.compile("<td>\\d+.\\d+</td>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        ResultString = regexMatcher.group();
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

thanks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM