简体   繁体   English

使用Jsoup在android中解析?

[英]Parsing in android using Jsoup?

I have a site having source code : 我有一个包含源代码的网站:

<article id="post-438" class="post-438 post type-post status-publish format-standard has-post-thumbnail hentry category-history tag-africa tag-asia tag-europe tag-maps tag-middle-east tag-mongol-empire tag-ottoman-empire tag-rise-of-islam">
    <header class="entry-header">

                <div class="entry-meta smallPart">
            <span class="posted-on"><i class="fa fa-clock-o spaceRight"></i><a href="https://muslimmemo.com/map-rise-islam/" rel="bookmark"><time class="entry-date published updated" datetime="2015-07-23T00:26:22+00:00">July 23, 2015</time></a></span><span class="byline"> <i class="fa fa-user spaceLeftRight"></i><span class="author vcard"><a class="url fn n" href="https://muslimmemo.com/author/sufyan/">Sufyan bin Uzayr</a></span></span><span class="comments-link"><i class="fa fa-comments-o spaceLeftRight"></i><a href="https://muslimmemo.com/map-rise-islam/#respond"><span class="dsq-postid" data-dsqidentifier="438 https://muslimmemo.com/?p=438">Leave a comment</span></a></span>       </div><!-- .entry-meta -->
                <h1 class="entry-title"><a href="https://muslimmemo.com/map-rise-islam/" rel="bookmark">Map Showing The Rise of Islam Down The Ages</a></h1>    </header><!-- .entry-header -->

    <div class="entry-summary">
        <p>This is a rather interesting map that shows the spread of Islam across Asia, Europe and Africa, down the ages. The earliest period is marked in shades of brown and red, followed by shades of yellow. South-east Asia is shown separately as an inset using shades of blue. While this map is far from perfect&#8230;</p>
    </div><!-- .entry-summary -->

    <footer class="entry-footer smallPart">
        <div class="cruzy-bottom-content">
            <span class="cat-links"><i class="fa fa-folder-open spaceRight"></i><a href="https://muslimmemo.com/content/history/" rel="category tag">History</a></span>                     <span class="read-link">
                <a class="readMoreLink invertPart" href="https://muslimmemo.com/map-rise-islam/">Read More<i class="fa fa-angle-double-right spaceLeft"></i></a>
            </span>
        </div>
    </footer><!-- .entry-footer -->
</article>

I need to fetch : 我需要获取:

  • What I need to fetch from the site: 我需要从该站点获取的内容:
    1. entry-title. 入门称号。 -> document.getElementByClassName("entry-title"); -> document.getElementByClassName(“ entry-title”);
    1. entry-link -> document.select("span.entry-title > a[href]") 入口链接-> document.select(“ span.entry-title> a [href]”)
    1. summary of the entry. 条目摘要。 -> document.getElementByClassName("entry-summary"); -> document.getElementByClassName(“ entry-summary”);
    1. author's link. 作者的链接。 -> document.select("span.author > a[href]") -> document.select(“ span.author> a [href]”)
    1. author's name. 作者的名字。 -> document.getElementByClassName("author"); -> document.getElementByClassName(“ author”);
    1. category. 类别。 -> document.getElementByClassName("cat-links"); -> document.getElementByClassName(“ cat-links”);
    1. category's link. 类别的链接。 -> document.select("span.cat-links > a[href]") -> document.select(“ span.cat-links> a [href]”)
    1. posting date -> document.getElementsByClass("published"); 发布日期-> document.getElementsByClass(“ published”);

I am doing it by this way: 我通过这种方式做到这一点:

        Document document = Jsoup.connect(url).get();

        heading = document.getElementsByClass("entry-title");
        headingLink = document.select("h1.entry-title > a[href]");
        headingSummary = document.getElementsByClass("entry-summary");
        author = document.getElementsByClass("author");
        authorLinks = document.select("span.author > a[href]");
        category = document.getElementsByClass("cat-links");
        categoryLinks = document.select("span.cat-links > a[href]");
        published = document.getElementsByClass("published");

It is working well but working very slowly. 它运行良好,但是运行非常缓慢。 How should I change my code for the same. 我应该如何更改我的代码。 Please help me. 请帮我。

Some hints from luksch: luksch的一些提示:

From my experience Jsoup is doing a pretty good job speed wise, altough a SAX based approach should be a bit faster. 根据我的经验,Jsoup在速度方面做得很好,虽然基于SAX的方法应该会更快一些。 Anyway, I use Jsoup a lot and never found it slow. 无论如何,我经常使用Jsoup,但从未发现它运行缓慢。 Network access however can be very slow, depending on a lot of parameters, some of which you don't have much control over. 但是,根据许多参数,网络访问可能非常缓慢,其中有些参数您无法控制。 I advise you to check out the connection over which you retrieve the data. 我建议您检查一下通过其检索数据的连接。 Maybe this is the culprit and not JSoup parsing. 也许这是元凶,而不是JSoup解析。


your JSoup use seems okay to me. 您的JSoup使用对我来说似乎还可以。 At least I don't see a way to speed that up by much. 至少我看不出有什么办法可以加快速度。 One thing could be to restrict the search for Elements by not starting at the document level, but at a suitable inner node. 一件事可能是限制搜索元素,而不是从文档级别开始,而是从一个合适的内部节点开始。 element.select(".whatever") will start at element not at the document. element.select(“。whatever”)将从元素而不是文档开始。 If your document is very big, this might help 如果您的文档很大,这可能会有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM