[英]How to fetch all the details of the object using Jsoup library and save it to the bean?
Here I'm Scraping the site https://hamrobazaar.com/c6-apparels-and-accessories and i want to store all the sub-categories details in a bean and print them.在这里,我正在抓取站点https://hamrobazaar.com/c6-apparels-and-accessories并且我想将所有子类别详细信息存储在一个 bean 中并打印它们。 If i got the details of object accordingly than this also would be much help.如果我相应地获得了 object 的详细信息,那么这也会有很大帮助。
Example:例子:
I want to Scrape Name of the Mask as Kn95 Mask (fda Certified), description as We are Seller..., seller name as Birodh Pokhrel, Address as Damak-5,Damak, price as 210, date, and type as Brand New我想将口罩的名称刮掉为 Kn95 口罩(fda 认证),描述为我们是卖家...,卖家名称为 Birodh Pokhrel,地址为 Damak-5,Damak,价格为 210,日期和类型为全新
If you are good at Jsoup, and Xpath.如果你擅长 Jsoup,还有 Xpath。 Please help me to obtain this.请帮我获得这个。 Thank You谢谢你
For the XPath part ( jsoup
doesn't support it, so maybe you can try with xsoup
):对于 XPath 部分( jsoup
不支持它,所以也许您可以尝试使用xsoup
):
Some selectors to grab the details from the ads, including the one with yellow background which stays the same for each page.一些选择器从广告中获取详细信息,包括黄色背景的选择器,每个页面都保持相同。 (article title, description, seller, address, price, item condition): (文章标题、描述、卖家、地址、价格、物品状况):
//font[@style]/b
//b[.="Seller:"]/preceding-sibling::text()[normalize-space()]
//b[.="Seller:"]/following-sibling::a
//b[.="Seller:"]/following-sibling::font
//b[starts-with(.,"Rs.")]
//b[starts-with(.,"Rs.")]/following-sibling::font
Number of elements for each detail: 21
每个细节的元素数量: 21
Some selectors to grab the details from the ads, excluding the one with yellow background which stays the same for each page.一些选择器从广告中获取详细信息,不包括黄色背景的选择器,每个页面都保持相同。 (article title, description, seller, address, price, item condition): (文章标题、描述、卖家、地址、价格、物品状况):
//font[@style][not(ancestor::table[@id])]/b
//b[.="Seller:"][not(ancestor::table[@id])]/preceding-sibling::text()[normalize-space()]
//b[.="Seller:"][not(ancestor::table[@id])]/following-sibling::a
//b[.="Seller:"][not(ancestor::table[@id])]/following-sibling::font
//b[not(ancestor::table[@id])][starts-with(.,"Rs.")]
//b[not(ancestor::table[@id])][starts-with(.,"Rs.")]/following-sibling::font
Number of elements for each detail: 20
每个细节的元素数量: 20
Side note: be careful with item condition.旁注:注意物品状况。 Some ads are missing this field.某些广告缺少此字段。 So, the number of elements could be lower than 20 or 21.因此,元素的数量可能低于 20 或 21。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.