简体   繁体   中英

How to get attribute content Jsoup?

I have

 <meta itemprop="datePublished" content="2015-01-26 12:37:00">

and I want to select the content . I try without success:

Document doc = Jsoup.connect("http://www.somesite.com/index.html").get();
Element link= doc.select("meta").first(); 
String contetn= link.attr("content");

But in my html I have:

<div style="overflow: visible;" itemscope="" itemtype="http://schema.org/Article">
<meta itemprop="url" content="http://www.somesite.com/index.html">
<meta itemprop="headline" content="some text">
<meta itemprop="datePublished" content="2015-01-26 12:37:00">
<meta itemprop="dateModified" content="2015-01-26 14:03:16">

You can see that I search for the 3-td tag meta and I can't select it.

Element link= doc.select("meta").first(); 

This will select only the first meta -element found; since you have more than one in your second html, you'll get the wrong result.

But here's an example :

final String html = "<div style=\"overflow: visible;\" itemscope=\"\" itemtype=\"http://schema.org/Article\">\n"
        + "<meta itemprop=\"url\" content=\"http://www.somesite.com/index.html\">\n"
        + "<meta itemprop=\"headline\" content=\"some text\">\n"
        + "<meta itemprop=\"datePublished\" content=\"2015-01-26 12:37:00\">\n"
        + "<meta itemprop=\"dateModified\" content=\"2015-01-26 14:03:16\">";

Document doc = Jsoup.parse(html);

Element meta = doc.select("meta[itemprop=datePublished]").first();
String content = meta.attr("content");

System.out.println(content);

Output: 2015-01-26 12:37:00

This will select all meta -elements with attribute itemprop and attribute value datePublished . From all found, just the first is taken. Finally from the single element you can get the value of the content -attribute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM