简体   繁体   中英

Parsing in android using Jsoup?

I have a site having source code :

<article id="post-438" class="post-438 post type-post status-publish format-standard has-post-thumbnail hentry category-history tag-africa tag-asia tag-europe tag-maps tag-middle-east tag-mongol-empire tag-ottoman-empire tag-rise-of-islam">
    <header class="entry-header">

                <div class="entry-meta smallPart">
            <span class="posted-on"><i class="fa fa-clock-o spaceRight"></i><a href="https://muslimmemo.com/map-rise-islam/" rel="bookmark"><time class="entry-date published updated" datetime="2015-07-23T00:26:22+00:00">July 23, 2015</time></a></span><span class="byline"> <i class="fa fa-user spaceLeftRight"></i><span class="author vcard"><a class="url fn n" href="https://muslimmemo.com/author/sufyan/">Sufyan bin Uzayr</a></span></span><span class="comments-link"><i class="fa fa-comments-o spaceLeftRight"></i><a href="https://muslimmemo.com/map-rise-islam/#respond"><span class="dsq-postid" data-dsqidentifier="438 https://muslimmemo.com/?p=438">Leave a comment</span></a></span>       </div><!-- .entry-meta -->
                <h1 class="entry-title"><a href="https://muslimmemo.com/map-rise-islam/" rel="bookmark">Map Showing The Rise of Islam Down The Ages</a></h1>    </header><!-- .entry-header -->

    <div class="entry-summary">
        <p>This is a rather interesting map that shows the spread of Islam across Asia, Europe and Africa, down the ages. The earliest period is marked in shades of brown and red, followed by shades of yellow. South-east Asia is shown separately as an inset using shades of blue. While this map is far from perfect&#8230;</p>
    </div><!-- .entry-summary -->

    <footer class="entry-footer smallPart">
        <div class="cruzy-bottom-content">
            <span class="cat-links"><i class="fa fa-folder-open spaceRight"></i><a href="https://muslimmemo.com/content/history/" rel="category tag">History</a></span>                     <span class="read-link">
                <a class="readMoreLink invertPart" href="https://muslimmemo.com/map-rise-islam/">Read More<i class="fa fa-angle-double-right spaceLeft"></i></a>
            </span>
        </div>
    </footer><!-- .entry-footer -->
</article>

I need to fetch :

  • What I need to fetch from the site:
    1. entry-title. -> document.getElementByClassName("entry-title");
    1. entry-link -> document.select("span.entry-title > a[href]")
    1. summary of the entry. -> document.getElementByClassName("entry-summary");
    1. author's link. -> document.select("span.author > a[href]")
    1. author's name. -> document.getElementByClassName("author");
    1. category. -> document.getElementByClassName("cat-links");
    1. category's link. -> document.select("span.cat-links > a[href]")
    1. posting date -> document.getElementsByClass("published");

I am doing it by this way:

        Document document = Jsoup.connect(url).get();

        heading = document.getElementsByClass("entry-title");
        headingLink = document.select("h1.entry-title > a[href]");
        headingSummary = document.getElementsByClass("entry-summary");
        author = document.getElementsByClass("author");
        authorLinks = document.select("span.author > a[href]");
        category = document.getElementsByClass("cat-links");
        categoryLinks = document.select("span.cat-links > a[href]");
        published = document.getElementsByClass("published");

It is working well but working very slowly. How should I change my code for the same. Please help me.

Some hints from luksch:

From my experience Jsoup is doing a pretty good job speed wise, altough a SAX based approach should be a bit faster. Anyway, I use Jsoup a lot and never found it slow. Network access however can be very slow, depending on a lot of parameters, some of which you don't have much control over. I advise you to check out the connection over which you retrieve the data. Maybe this is the culprit and not JSoup parsing.


your JSoup use seems okay to me. At least I don't see a way to speed that up by much. One thing could be to restrict the search for Elements by not starting at the document level, but at a suitable inner node. element.select(".whatever") will start at element not at the document. If your document is very big, this might help

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM