简体   繁体   English

如何摆脱jsoup中的html,只提取html表内容?

[英]how to get rid of html in jsoup and only extract html table content?

So I'm trying to access the table in this website http://www.engin.umich.edu/htbin/wwwhostinfo?detail=0&display=all&sort=open and trying to make it into an Elements object. 因此,我试图访问此网站http://www.engin.umich.edu/htbin/wwwhostinfo?detail=0&display=all&sort=open中的表,并尝试将其制成Elements对象。 I only need the first and fourth columns . 我只需要第一列和第四列。 So I'm using jsoup and doing this : 所以我正在使用jsoup并执行以下操作:

Document doc = Jsoup.connect("http://www.engin.umich.edu/htbin/wwwhostinfo?detail=0&display=all&sort=open").get();
        Elements buildings = doc.select("td:eq(0),td:eq(3)");

This should select the first and fourth columns. 这应该选择第一和第四列。 It is doing that but with all the html data as well I need to skip all the initial stuff in the webpage "The following report ... ". 这样做是为了但同时也包含所有html数据,我需要跳过网页“ The following report ...”中的所有初始内容。 And I only need the two columns - Building and Open so that I can simply initialize extra variables and assign the number of open computers in a building to it and finally use Toast or something similar to display the number of open computers in a building on the screen. 而且我只需要两列-Building和Open,以便我可以简单地初始化额外的变量并为其分配建筑物中打开的计算机的数量,最后使用Toast或类似的方法在Windows上显示建筑物中打开的计算机的数量。屏幕。

Currently I'm using a TextView to show data and its showing me all the html data I don't want as well. 目前,我正在使用TextView来显示数据,它也向我显示了所有我不想要的html数据。

TextView tv = new TextView(this);
        tv.setText(""+buildings);
        setContentView(tv);

Can Individual values be extracted from Elements ? 可以从Elements中提取单个值吗?

in short: How to extract only building names and No. of Open computers by skipping ALL other data and assign them to their own variables? 简而言之:如何通过跳过所有其他数据并将其分配给自己的变量来仅提取建筑物的名称和开放计算机的数量?

Any ideas on how to do this?. 有关如何执行此操作的任何想法?

Thanks in Advance - av 在此先感谢-AV

you could use the JSOUP Cleaner & Whitelist for that task. 您可以将JSOUP Cleaner&Whitelist用于该任务。

Just define what shall not be removed and you're good to go! 只需定义不应该删除的内容,您就可以开始了!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM