简体   繁体   English

如何使用 Jsoup 库获取 object 的所有详细信息并将其保存到 bean?

[英]How to fetch all the details of the object using Jsoup library and save it to the bean?

Here I'm Scraping the site https://hamrobazaar.com/c6-apparels-and-accessories and i want to store all the sub-categories details in a bean and print them.在这里,我正在抓取站点https://hamrobazaar.com/c6-apparels-and-accessories并且我想将所有子类别详细信息存储在一个 bean 中并打印它们。 If i got the details of object accordingly than this also would be much help.如果我相应地获得了 object 的详细信息,那么这也会有很大帮助。

Example:例子:

图片来自同一站点,即 https://hamrobazaar.com/c6-apparels-and-accessories

I want to Scrape Name of the Mask as Kn95 Mask (fda Certified), description as We are Seller..., seller name as Birodh Pokhrel, Address as Damak-5,Damak, price as 210, date, and type as Brand New我想将口罩的名称刮掉为 Kn95 口罩(fda 认证),描述为我们是卖家...,卖家名称为 Birodh Pokhrel,地址为 Damak-5,Damak,价格为 210,日期和类型为全新

If you are good at Jsoup, and Xpath.如果你擅长 Jsoup,还有 Xpath。 Please help me to obtain this.请帮我获得这个。 Thank You谢谢你

For the XPath part ( jsoup doesn't support it, so maybe you can try with xsoup ):对于 XPath 部分( jsoup不支持它,所以也许您可以尝试使用xsoup ):

Some selectors to grab the details from the ads, including the one with yellow background which stays the same for each page.一些选择器从广告中获取详细信息,包括黄色背景的选择器,每个页面都保持相同。 (article title, description, seller, address, price, item condition): (文章标题、描述、卖家、地址、价格、物品状况):

//font[@style]/b
//b[.="Seller:"]/preceding-sibling::text()[normalize-space()]
//b[.="Seller:"]/following-sibling::a
//b[.="Seller:"]/following-sibling::font
//b[starts-with(.,"Rs.")]
//b[starts-with(.,"Rs.")]/following-sibling::font

Number of elements for each detail: 21每个细节的元素数量: 21

Some selectors to grab the details from the ads, excluding the one with yellow background which stays the same for each page.一些选择器从广告中获取详细信息,不包括黄色背景的选择器,每个页面都保持相同。 (article title, description, seller, address, price, item condition): (文章标题、描述、卖家、地址、价格、物品状况):

//font[@style][not(ancestor::table[@id])]/b
//b[.="Seller:"][not(ancestor::table[@id])]/preceding-sibling::text()[normalize-space()]
//b[.="Seller:"][not(ancestor::table[@id])]/following-sibling::a
//b[.="Seller:"][not(ancestor::table[@id])]/following-sibling::font
//b[not(ancestor::table[@id])][starts-with(.,"Rs.")]
//b[not(ancestor::table[@id])][starts-with(.,"Rs.")]/following-sibling::font

Number of elements for each detail: 20每个细节的元素数量: 20

Side note: be careful with item condition.旁注:注意物品状况。 Some ads are missing this field.某些广告缺少此字段。 So, the number of elements could be lower than 20 or 21.因此,元素的数量可能低于 20 或 21。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM