简体   繁体   English

使用Jsoup提取图像ID

[英]Extract image id with Jsoup

I am trying to extract a specific captcha image id using api Jsoup, the html image tag is like : 我正在尝试使用api Jsoup提取特定的验证码图像ID,html图像标签如下所示:

<img id="wlspispHIPBimg03256465465dsd5456" style="display: inline; width: 200px; height: 100px;" aria-hidden="true" src="https://users/hip/data/rnd=435cb60d0a6b63ef4">

This is my code to obtain the attribute id="wlspispHIPBimg03256465465dsd5456" : 这是我的代码,用于获取属性id="wlspispHIPBimg03256465465dsd5456"

doc = Jsoup.connect("http://go.microsoft.com/fwlink/?LinkID=614866&clcid")
                .timeout(0).get();

Elements images = doc.select("img[src~=(?i)]");
for (Element image : images) {
    System.out.println(image.attr("id"));
}

The problem is that i can't get the id of captcha image 问题是我无法获得验证码图像的ID

You need to find something in the html that discriminates the img tag of any other tag in the document. 您需要在html中找到可以区分文档中其他任何标签的img标签的内容。 From your posted code that is can't be deduced, so i use my imagination here: 从您无法推断的已发布代码中,所以我在这里发挥我的想象力:

Element imageEl = doc.select("img[scr*=rnd]").first();

This exploits that the source of the image contains "rnd" in it path. 这利用了图像源在其路径中包含“ rnd”的情况。 To get the best solution you must look yourself. 为了获得最佳的解决方案,您必须自己看看。 Also it helps a lot if you learn the CSS selectors of Jsoup. 如果您学习Jsoup的CSS选择器,也会很有帮助。

I think you simply can't accomplish this using only Jsoup, the DOM is modified at runtime with javascript and jsoup simply does not execute it. 我认为您无法仅使用Jsoup来完成此操作,DOM在运行时使用javascript进行了修改,而jsoup根本不执行它。

View also this other question . 另请参阅其他问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM