简体   繁体   English

Jsoup Android解析

[英]Jsoup Android parsing

I am parsing a web page ... and now when I am done... I see I am doing something incredibly stupid, so if anyone can point that out to me please :) and what would be the right direction. 我正在解析一个网页...现在完成后...我看到我正在做一件非常愚蠢的事情,所以如果有人可以向我指出这一点:)正确的方向是什么。

I have a Android app that uses Jsoup and it works great, but it is terribly slow! 我有一个使用Jsoup的Android应用程序,它运行良好,但运行速度非常慢! I know the reason why... because basically onCreate I have 20,30 Jsoup getElement requests... 我知道原因...因为基本上onCreate我有20,30个Jsoup getElement请求...

 private class Task extends AsyncTask<Void, Void, Void>{
    String linkText;
    @Override
    protected Void doInBackground(Void... params) {
        Initdata();

        return null;
    }
    @Override
    protected void onPostExecute(Void param) {

        mProgressBarHandler.hide();           

        redraw();
        inflatedView.invalidate();
    }

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
        mProgressBarHandler.show();
    }
}

In the Initdata() method I have that 20-30 Jsoup requests. 在Initdata()方法中,我有20-30个Jsoup请求。 Even with AsyncTask it is very slow, the only diference now is that I am not blocking the UI thread,... and that is great but I need to "optimize" somehow the parsing of this elements. 即使使用AsyncTask,它也非常慢,现在唯一的区别是,我没有阻塞UI线程,这很不错,但是我需要以某种方式“优化”该元素的解析。

private void Initdata(){

    loadImages();
    players = new String[] {util.GetElement("div.item-2:first-child", "http://www.istinomer.rs/", 0),
            util.GetElement("div.item-2:nth-child(2)", "http://www.istinomer.rs/", 0),
            util.GetElement("div.item-2:nth-child(3)", "http://www.istinomer.rs/", 0),
            util.GetElement("div.item-2:nth-child(4)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(5)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(6)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(7)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(8)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(9)","http://www.istinomer.rs/",0),
            util.GetElement("div.item-2:nth-child(10)","http://www.istinomer.rs/",0)
    };
 vestiDescription1 = util.GetElement("div.item-big h2", "http://www.istinomer.rs/", 0) + System.getProperty("line.separator")
            + util.GetElement("div.item-big h3","http://www.istinomer.rs/",0);

    vestiDescription2 = util.GetElement("div.grid-8 h2 a", "http://www.istinomer.rs/", 0) + System.getProperty("line.separator")
            + util.GetElement2("div.grid-8 h3","http://www.istinomer.rs/",0);

    vestiDescription3 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(4) > div:nth-child(1) > div:nth-child(4) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription4 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(4) > div:nth-child(1) > div:nth-child(5) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription5 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(4) > div:nth-child(1) > div:nth-child(6) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription6 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(2) > h2:nth-child(4)", "http://www.istinomer.rs/", 0) + System.getProperty("line.separator")
            + util.GetElement2("div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(2) > h3:nth-child(5)","http://www.istinomer.rs/",0);

    vestiDescription7 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(3) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription8 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(4) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription9 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(5) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);

    vestiDescription10 = util.GetElement(
            "div.gd-container-1:nth-child(6) > div:nth-child(5) > div:nth-child(1) > div:nth-child(6) > div:nth-child(2) > h3:nth-child(2)", "http://www.istinomer.rs/", 0);
currency = new String[]{
            vestiDescription1,
            vestiDescription2,
            vestiDescription3,
            vestiDescription4,
            vestiDescription5,
            vestiDescription6,
            vestiDescription7,
            vestiDescription8,
            vestiDescription9,
            vestiDescription10
    };

public String GetElement(String Element, String site, int mode) {
    try {

        Elements newsHeadlines = null;

        if (mode == 0) {
            Document doc = Jsoup.connect(site).timeout(600000).get();
            newsHeadlines = doc.select(Element);
        }
        //1 gets link from class
        else if (mode == 1) {
            Document doc = Jsoup.connect(site).timeout(600000).get();
            String link = doc.select(Element).toString();
            return link;
        }

        //Log.d("TMS", "Data is " + html2text(newsHeadlines.toString()));

        String returnData = html2text(newsHeadlines.toString());
        return returnData;
    }
    catch (Exception e) {
        Log.d("TMS", "EXCEPTION GetElement: " + Element);
        e.printStackTrace();
        return "Error";
    }

Any idea how can I speed up? 知道如何加快速度吗?

You're requesting the same document to parse repeatedly on each call to GetElement! 您要求在每次调用GetElement时重复分析同一文档! Of course it's slow! 当然慢了!

Instead, make ONE call to JSoup to fetch the document, then use the Document object it returns for all the queries against that document. 而是,对JSoup进行一次调用以获取文档,然后对所有针对该文档的查询使用它返回的Document对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM