简体   繁体   English

JSOUP在Android中提取绝对URL

[英]JSOUP extract an absolute URL in Android

I've been looking everywhere. 我到处都是。 Tried a lot of "solutions" but none of 'em helped. 尝试了很多“解决方案”,但它们都没有帮助。 I need to extract url address of sub-website from html code. 我需要从html代码中提取子网站的网址。 The code contains a lot of url's so I need to shorten the result list somehow so it leaves only the links that I need. 该代码包含许多url,因此我需要以某种方式缩短结果列表,以便仅保留所需的链接。

Details: 细节:

 <li class="container results-list-item clear-me ">
            <div class="job-offer-content container h-card">
                <div class="position-head container">
                  <div class="container  ">
                      <h2 class="p-job-title">
                          <a href="/praca/android-developer-junior-senior/wroclaw/11636002" rel="nofollow" 
                          title="praca Android developer (junior/senior) dolnośląskie" class="job-offer ">
                              <strong class="keyword">Android</strong> <strong class="keyword">developer</strong> (junior/senior)
                          </a>
                      </h2>
                          <h3 class="p-name company">
                                  <a href="/pracodawca/starware-firma-informatyczna-praca/843242">
                                      Starware Firma Informatyczna
                              </a>
                          </h3>

It is only a part of html code. 它只是html代码的一部分。 As I said, it contains a lot of url so ideas like doc.select("a").first(); 就像我说的,它包含很多url,所以像doc.select("a").first();这样的想法就可以doc.select("a").first(); will not help. 将无济于事。

I want to extract all url from section <h2 class="p-job-title"> (it happens multiple times in code, because it is a result of search on certain website) I tried also doc.select("h2.p-job-title a[href]"); 我想从<h2 class="p-job-title">部分中提取所有网址(它在代码中多次发生,因为它是在某些网站上搜索的结果),我也尝试了doc.select("h2.p-job-title a[href]"); but the output is Android developer (junior/senior) and I need /pracodawca/starware-firma-informatyczna-praca/843242 and in the absolute form at best )I think that www.mywebsite + url would by just made by some concat or something so it shouldn't be to hard). 但输出是Android developer (junior/senior) ,我需要/pracodawca/starware-firma-informatyczna-praca/843242且以绝对形式最多)我认为www.mywebsite + url只能由一些concat或因此它不应该很难)。

EDIT: My whole activity class' code 编辑:我整个活动类的代码

public class ListaActivity extends ActionBarActivity{
    StartActivity startActiv;
    private List<String> mLista = new ArrayList<>();
    private ListView mListView;
    private MiastaListAdapter mAdapter;
    public Elements jobName, jobName2;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_lista_miast);  
        mListView = (ListView) findViewById(R.id.lista_miast);  
        new NewThread().execute();
        mAdapter = new MiastaListAdapter(this, mLista);
        mListView.setAdapter(mAdapter);
    }

    public class NewThread extends AsyncTask<String, Void, String> {
        @Override
        protected String doInBackground(String... arg) {
            String doURLwork = startActiv.nazwaStanowiska;
            String doURLplace = startActiv.nazwaMiejscowosci;

            Document doc;
            try {
                doc = (Document) Jsoup.connect("http://www.infopraca.pl/praca?q=" + doURLwork + "&lc=" + doURLplace)
                        .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0").get();

                jobName = doc.select("h2.p-job-title a[href]"); //infopraca

                    for (Element jobNames : jobName) {
                        mLista.add(jobNames.text() + "\n");
                    }

            } catch (IOException e) {
                e.printStackTrace();
            }
            return null;
        }

        @Override
        protected void onPostExecute(String result) {
            mAdapter.notifyDataSetChanged();
        }
    }
}

You are trying to get Text from your selected elements. 您正在尝试从所选元素中获取文本。 mLista.add(jobNames.text() + "\\n"); which is wrong. 这是错误的。 if you need links you have to get attribute href from your selected elements. 如果需要链接,则必须从所选元素中获取attribute href

try something like this 尝试这样的事情

Elements class= doc.getElementsByClass("p-job-title");
Elements link= class.select("a");
String url = link.attr("href");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM