简体   繁体   中英

Extract text in a order using jsoup

I want to extract the text inside the "job title" and the text inside "summary" class. There are many with the same class names. So I want the job title of the first one and summary of it. And then the job title of the next one and the summary of it. In that order.

The following code works. But it first gives all the titles and then all the text inside all the summary classes. I want the first job title and the first summary. Then the second job title and the second summary and so on. How do I modify the code for this? Please help.

 <div class="  row  result" id="p_64c5268586001bd2" data-jk="64c5268586001bd2" itemscope="" itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
 <h2 id="jl_64c5268586001bd2" class="jobtitle">
 <a rel="nofollow" href="/rc/clk?jk=64c5268586001bd2" target="_blank" onmousedown="return rclk(this,jobmap[0],0);" onclick="return rclk(this,jobmap[0],true,0);" itemprop="title" title="Fashion Assistant" class="turnstileLink" data-tn-element="jobTitle"><b>Fashion</b> Assistant</a>
 </h2>
 <span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
    <span itemprop="name">
    <a href="/cmp/Itv?from=SERP&amp;campaignid=serp-linkcompanyname&amp;fromjk=64c5268586001bd2&amp;jcid=3bf3e8a57da58ff5" target="_blank">
ITV Jobs</a></span>
   </span>

     <a data-tn-element="reviewStars" data-tn-variant="cmplinktst2" class="turnstileLink " href="/cmp/Itv/reviews?jcid=3bf3e8a57da58ff5" title="Itv Jobs reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&amp;from=SERP&amp;jt=Fashion+Assistant&amp;fromjk=64c5268586001bd2');" target="_blank">
    <span class="ratings"><span class="rating" style="width:49.5px;"><!-- ->        </span></span><span class="slNoUnderline">28 reviews</span></a>
<span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place">      <span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">London</span></span></span>
 <table cellpadding="0" cellspacing="0" border="0">
 <tbody><tr>
 <td class="snip">
 <div>
 <span class="summary" itemprop="description">
  Do you have a passion for <b>Fashion</b>? You will be responsible for     running our <b>fashion</b> cupboard, managing a team of interns and liaising with press officers to...</span>
   </div>

doc = Jsoup.connect("http://www.indeed.co.uk/jobs?q=fashion&l=England").timeout(5000).get();
Elements f = doc.select(".jobtitle");
Elements e = doc.select(".summary");
System.out.println("Title: " + f.text());
System.out.println("Details: "+ e.text());

Iterate over titles and then find the summary for each title:

for (Element title : doc.select(".jobtitle")) {
    Element summary = title.parent().select(".summary").first();

    System.out.format("Title: %s. Summary: %s%n", title.text(), summary.text());
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM