[英]Parsing table data with jsoup
I am using jsoup in my android app to parse my html code but now I need parse table data and I can not get it to work. 我在我的Android应用程序中使用jsoup解析我的html代码,但是现在我需要解析表数据,但无法正常工作。 I try many ways but not successful so I want try luck here if anyone have experience.
我尝试了很多方法,但都没有成功,所以如果有人有经验,我想在这里试试运气。
Here is part of my html: 这是我的html的一部分:
<div id="editacia_jedla">
<h2>My header</h2>
<h3>My sub header</h3>
<table border="0" class="jedalny_listok_tabulka" cellpadding="2" cellspacing="1">
<tr>
<td width="100" class="menu_nazov neparna" align="left">Food Menu 1</td>
<td class="jedlo neparna" align="left">vegetable and beef
<div class="jedlo_box_alergeny">Allergens: <a href="#" class="alergen_1">1</a>, <a href="#" class="alergen_3">3</a></div>
</td>
</tr>
<tr>
<td width="100" class="menu_nazov parna" align="left">Food Menu 2</td>
<td class="jedlo parna" align="left">Potato salad and pork
<div class="jedlo_box_alergeny">Allergens: <a href="#" class="alergen_6">6</a></div>
</td>
</tr>
</table>
etc
</div>
My java/android code: 我的Java / Android代码:
try {
String tableHtmlCode="";
Document fullHtmlDocument = Jsoup.connect(urlOfFoodDay).get();
Element elm1 = fullHtmlDocument.select("#editacia_jedla").first();
for( Element element : elm1.children() )
{
tableHtmlCode+=element.getElementsByIndexEquals(2); //this set table content because 0=h2, 1=h3
}
Document parsedTableDocument = Jsoup.parse(tableHtmlCode);
//Element th = parsedTableDocument.select("td[class=jedlo neparna]").first(); THIS IS BAD
String foodContent="";
String foodAllergens="";
}
So now I want extract text vegetable and beef and save it to string foodContent and numbera 1, 3(together) from div class jedlo_box_alergeny save to string foodAllergens. 所以现在我要提取文本蔬菜和牛肉 ,并将其保存到div类jedlo_box_alergeny的 foodContent和numbera 1、3(一起)中, 并保存到string foodAllergens。 Someone can help?
有人可以帮忙吗? I will very grateful for any ideas
我会很感激任何想法
Iterate over your document's parent tag jedalny_listok_tabulka
and loop over td
tags. 遍历文档的父标记
jedalny_listok_tabulka
并遍历td
标记。
td
tag is the parent to href
tags which include the allergy values. td
标签是href
标签的父标签, href
标签包括过敏值。 Hence, you would loop over the tags a
elements to get your numbers, something like: 因此,你会遍历所有的标签
a
元素,让您的数字,是这样的:
Elements myElements = doc.getElementsByClass("jedalny_listok_tabulka")
.first().getElementsByTag("td");
for (Element element : myElements) {
if (element.className().contains("jedlo")) {
String foodContent = element.ownText();
String foodAllergen = "";
for (Element href : element.getElementsByTag("a")) {
foodAllergen += " " + href.text();
}
System.out.println(foodContent + " : " + foodAllergen);
}
}
Output: 输出:
vegetable and beef : 1 3
Potato salad and pork : 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.