简体   繁体   中英

Jsoup Extracting what's not in the span class

Reference material

http://www.tptp.org/CASC/J9/WWWFiles/Results.html

So I am using Jsoup to extract data from a webpage.However, I've run into a slight problem. I am getting an error that looks like this:

406 81%
401 80%
355 71%
209 41%
163 81%
162 81%
157 78%
92 46%Exception in thread "main" 
461 92%
454 90%
362 72%
350 70%
298 59%
256 51%
247 49%
143 28%
133 26%
126 25%
123 24%
122 24%
73 14%
50 10%
java.lang.IndexOutOfBoundsException: Index: 22, Size: 22
    at java.util.ArrayList.rangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at org.jsoup.select.Elements.get(Elements.java:544)
    at test.Etest.main(Etest.java:44)

Which is odd because when I was doing something similar to this before I never got this error. Here is the code that I wrote.

Document doc = Jsoup.connect(html).get();
Elements tableElements = doc.select("table");
//get the other tables maybe?
Elements tableHeaderEles = tableElements.select("tr:contains(Solutions) > td");
            for(int z = 0; 0 < tableHeaderEles.size(); z++) {
                System.out.println(tableHeaderEles.get(z).text());
            }

The only lines I'm interested in are the Solutions row and not the solutions column and I want to leave behind the percentage. I started with this for loop just to get it going. I also only need the first six tables but I can work that out later on my own. So from this line and lines similar to it, I just want the 406.

<td align="RIGHT" bgcolor="WHITE">406<span class="xxsmallfont">&nbsp;81%</span>

So to summarize real quickly, I have 2 questions.

 1. How am I getting this error,especially that weird exception? Its 
extracting fine at the beginning, is it not going to the other tables?
 2. How do I get just the 406? text() will take the percentage with it
 and its outside of the span so thats not an option.

The unfortunate part about all this is that I was doing this an easier way,excel sheet, but because of that &nbsp I have to do this. Any help or pointers appreciated. Sorry for the long post.

Another individual helped me out and this is the way to achieve what I have asked.

for(int z = 0; z < Solutions.size(); z++) {
                a = Solutions.get(z).text();
                b = Solutions.get(z).select("span").text();
                result = a.replace(b, " ");
                System.out.println(result);

                                            }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM