Jsoup图像标记提取

Question

i need to extract an image tag using jsoup from this html 我需要使用此html中的jsoup提取图像标记

<div class="picture"> 
    <img src="http://asdasd/aacb.jpgs" title="picture" alt="picture" />
</div>

i need to extract the src of this img tag ... i am using this code i am getting null value 我需要提取这个img标签的src ...我正在使用这个代码我得到空值

Element masthead2 = doc.select("div.picture").first();
String linkText = masthead2.outerHtml();
Document doc1 = Jsoup.parse(linkText);
Element masthead3 = doc1.select("img[src]").first();
String linkText1 = masthead3.html();

Answer 1

Here's an example to get the image source attribute: 这是获取图像源属性的示例：

public static void main(String... args) {
    Document doc = Jsoup.parse("<div class=\"picture\"><img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /></div>");
    Element img = doc.select("div.picture img").first();
    String imgSrc = img.attr("src");
    System.out.println("Img source: " + imgSrc);
}

The div.picture img selector finds the image element under the div. div.picture img选择器在div下找到image元素。

The main extract methods on an element are: 元素的主要提取方法是：

attr(name) , which gets the value of an element's attribute, attr(name) ，它获取元素属性的值，
text() , which gets the text content of an element (eg in <p>Hello</p> , text() is "Hello"), text() ，它获取元素的文本内容（例如，在<p>Hello</p> ，text（）是“Hello”），
html() , which gets an element's inner HTML ( <div><img></div> html() = <img> ), and html() ，它获取元素的内部HTML（ <div><img></div> html（）= <img> ），以及
outerHtml() , which gets an elements full HTML ( <div><img></div> html() = <div><img></div> ) outerHtml() ，它获取一个完整的HTML元素（ <div><img></div> html（）= <div><img></div> ）

You don't need to reparse the HTML like in your current example, either select the correct element in the first place using a more specific selector, or hit the element.select(string) method to winnow down. 您不需要像在当前示例中那样重新解析HTML，要么使用更具体的选择器在第一位选择正确的元素，要么点击element.select(string)方法以进行winnow down。

Answer 2

With the following code I can extract the image correctly: 使用以下代码，我可以正确提取图像：

    Document doc = Jsoup.parse("<div class=\"picture\"> <img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /> </div>");

    Element elem = doc.select("div.picture img").first();

    System.out.println("elem: " + elem.attr("src"));

I'm using jsoup release 1.2.2 , the latest one. 我正在使用最新的jsoup版本1.2.2 。

Maybe you're trying to print the inner html of an empty tag like img. 也许你正在尝试打印像img这样的空标签的内部html。

From the documentation: "html() - Retrieves the element's inner HTML". 从文档：“html（） - 检索元素的内部HTML”。

For the second portion of html you can use: 对于html的第二部分，您可以使用：

    Document doc2 = Jsoup.parse("<tr>  <td class=\"blackNoLine\" nowrap=\"nowrap\" valign=\"top\" width=\"25\" align=\"left\"><b>CAST: </b></td>  <td class=\"blackNoLine\" valign=\"top\" width=\"416\">Jay, Shazahn Padamsee&nbsp;</td>  </tr>");
    Elements trElems = doc2.select("tr");
    if (trElems != null) {
        for (Element element : trElems) {
            Element secondTd = element.select("td").get(1);

            System.out.println("name: " + secondTd.text());
        }
    }

which prints "Jay, Shazahn Padamsee". 其中印有“Jay，Shazahn Padamsee”字样。

Answer 3

<tr>  <td class="blackNoLine" nowrap="nowrap" valign="top" width="25" align="left"><b>CAST: </b></td>  <td class="blackNoLine" valign="top" width="416">Jay, Shazahn Padamsee&nbsp;</td>  </tr>

You can use: 您可以使用：

Document doc = Jsoup.parse(...);
Elements els = doc.select("td[class=blackNoLine]");
Element el= els.get(1);
String castName = el.text();

Jsoup图像标记提取

问题描述

3 个解决方案

解决方案1
6 2010-08-04 10:19:33

解决方案2
1 2010-08-02 21:25:51

解决方案3
1 2010-08-03 00:09:37

Jsoup图像标记提取

问题描述

3 个解决方案

解决方案1 6 2010-08-04 10:19:33

解决方案2 1 2010-08-02 21:25:51

解决方案3 1 2010-08-03 00:09:37

解决方案1
6 2010-08-04 10:19:33

解决方案2
1 2010-08-02 21:25:51

解决方案3
1 2010-08-03 00:09:37