简体   繁体   中英

how to get all class attributes in span tag using jSoup in Java

I am using Jsoup for data extraction from the page :

https://www.justdial.com/Indore/Shahi-Bhog-Caterers-Opposite-Sayaji-Behind-Hotel-Park-Vijay-Nagar-Vijay-Nagar/0731PX731-X731-120525133215-B7M1_BZDET?xid=SW5kb3JlIENhdGVyZXJz

Now I want to get all class attributes that fall within the span tag. But I cant get all of them. I am getting only one attribute randomly and its repeating 3 times. I don't know why.

 package scrapers;

import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import javax.swing.JFrame;
import javax.swing.JOptionPane;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 *
 * @author kushagrabehere
 */


public class ReviewCounter {


  void ReviewCounters() throws IOException
    {
    Document doc=Jsoup.connect("https://www.justdial.com/Indore/Shahi-  Bhog-Caterers-Opposite-Sayaji-Behind-Hotel-Park-Vijay-Nagar-Vijay-Nagar/0731PX731-X731-120525133215-B7M1_BZDET?xid=SW5kb3JlIENhdGVyZXJz").get();
            Elements contactNumber=doc.select("a.tel.ttel");
            System.out.println("contact :" );
    String cContact;

    for(Element numbers:contactNumber){



            cContact=numbers.getElementsByTag("span").attr("class");
            System.out.println("contact :" + cContact);

        }

enter image description here

I want to get these all of the class names that are shown in contact number :

First of all you need to check whether you are getting complete HTML or not,

If HTML is render using JS then JSOUP does not able to access the element because Jsoup does not execute JS code it self.

To check the html create 1 file and search your required DOM is available or not.

for that try something like below

FileWriter fw = new FileWriter(new File("C:/demo.html"));
        Document doc=Jsoup.connect("https://www.justdial.com/Indore/Shahi-  Bhog-Caterers-Opposite-Sayaji-Behind-Hotel-Park-Vijay-Nagar-Vijay-Nagar/0731PX731-X731-120525133215-B7M1_BZDET?xid=SW5kb3JlIENhdGVyZXJz").get();
        fw.write(doc.toString());
        Elements contactNumber=doc.getElementsByClass("a.tel.ttel");
        System.out.println("contact :" );
        String cContact;


        for(Element numbers:contactNumber){
            cContact=numbers.getElementsByTag("span").attr("class");
            System.out.println("contact :" + cContact);
        }

        fw.close();

First you should do as Alpesh Jikadra suggests in his answer and check if this info is loaded dynamically.

You do this:

Elements contactNumber=doc.select("a.tel.ttel");

This selects all anchor elements with classes tel and ttel. When I open the page in the browser I only find one such element. So when you do cycle over all found a elements, you probably do so only once.

What you want to do is find all span elements within each a element inside our loop. Something like this:

for(Element numbers:contactNumber){
  Elements digitSpans = numbers.select("span.mobilesv");
  for (Element digitSpan : digitsSpans){
     String digitClasses = digitSpan.attr("class");
     //look up which icon class is which number or do do whatever
  }
}

Note, that above code is not tested and just typed in the answer box. I did not let it run.

BTW: You can also get individual classes in Jsoup via Element.classNames() (see Jsoup get class name )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM