简体   繁体   中英

Using Jsoup to parse html

theFirst off I am very new to coding in java, and I am using Android Studio. I am using Jsoup to go to a URL and grab the HTML source code. My code successfully does this, now I need to parse the HTML for one specific line. The string I need from the HTML contains a link, but I do not need the address of the link just the string that is displayed as the link. This is the code from the class I am using to accomplish this:

private class FetchAnton extends AsyncTask<Void, Void, Void> {

    String price;
    String url = "http://www.antoncoop.com/markets/cash.php";
    Elements hrefEles;
    String value = null;
    String html = null;
    Document doc = null;

    @Override
    protected Void doInBackground(Void... params) {

        try {
            //Connect to website
            html = Jsoup.connect(url).get().toString();

            if (html != null && html.length() > 0) {
                doc = Jsoup.parse(html);           
                if (doc != null) {
                    /** Get all A tag element with HREF attribute like '/markets/cashchart.php?c=2246' **/
                    hrefEles = doc.select("a[href*=/markets/cashchart.php?c=2246]");

                    if (hrefEles != null && hrefEles.size() > 0) {
                        for (Element e: hrefEles) {
                            //value = e.ownText();
                           // break;
                        }

                        price = value;
                    }
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

Here is the section HTML that I am interested in:

</table>
<br />
<table class="homepage_quoteboard" cellspacing="0" cellpadding="0" border="0" width="100%">
<thead>
<tr class="section">
<td colspan="10">Wheat</td>
</tr>
<tr>
<td width="10%">Name</td>
<td width="10%">Delivery</td>
<td width="10%">Delivery End</td>
<td width="10%">Futures Month</td>
<td width="10%" align="right">Futures Price</td>
<td width="10%" align="right">Change</td>
<td width="10%" align="right">Basis</td>
<td width="10%" align="right">Cash Price</td>
<td width="10%" align="right">Settlement</td>
<td width="10%">Notes</td>
</tr>
</thead>
<tbody>
<script language="javascript">          
writeBidRow('Wheat',-60,false,false,false,0.5,'01/15/2015','02/26/2015','All','&nbsp;','&nbsp;',60,'even','c=2246&l=3519&d=G15',quotes['KEH15'], 0-0);
writeBidRow('Wheat',-65,false,false,false,0.5,'07/01/2015','07/31/2015','All','&nbsp;','&nbsp;',60,'odd','c=2246&l=3519&d=N15',quotes['KEN15'], 0-0);
</script>
</tbody>
</table>

The only thing I am interested in is getting the $4.91 as the string called "price". It is in the line of HTML code that is indented farthest to the right. Can anyone tell me what code to use to accomplish this?

Everything is described clearly in the following source codes with comments.

@Override
protected Void doInBackground(Void... params) {
    String value = null;
    String html = null;
    Document doc = null;
    Elements hrefEles = null;

    try {
        //Connect to website
        html = Jsoup.connect(url).get().toString();

        if (html != null && html.length() > 0) {
            doc = Jsoup.parse(html);

            if (doc != null) {
                /** Get all A tag element with HREF attribute like '/markets/cashchart.php?c=2246' **/
                hrefEles = doc.select("a[href*=/markets/cashchart.php?c=2246]"); 

                if (hrefEles != null && hrefEles.size() > 0) {
                    for (Element e: hrefEles) {
                        value = e.ownText();
                        break;
                    }

                    System.out.println("value: " + value);
                }
            }
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return null;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM