简体   繁体   English

JSoup,如何从动态<a href>标签</a>返回数据

[英]JSoup, how to return data from a dynamic <a href> tag

Very new to JSoup, trying to retrieve a changeable value that is stored within an tag, specifically from the following website and html. Snapshot of HTML JSoup 的新手,试图检索存储在标签中的可变值,具体来自以下网站和 html。HTML的快照

the results after "consitituency/" are changeable and dependent on the input of the user. “consitituency/”之后的结果是可变的,取决于用户的输入。 I am able to retrieve the h2 tags themselves but not the information within.我能够自己检索 h2 标签,但不能检索其中的信息。 At the moment the best return I can get is just tags using the method below目前我能得到的最好回报就是使用下面的方法标记

The desired return would be something that I can substring down into期望的回报将是我可以将 substring 归结为

Dublin Bay South都柏林湾南

The actual return is实际回报是

<well.col-md-4.h2></well.col-md-4.h2> <well.col-md-4.h2></well.col-md-4.h2>

        private String jSoupTDRequest(String aLine1, String aLine3) throws IOException {
        String constit = "";
        String h2 = "h2";
     String url = "https://www.whoismytd.com/search?utf8=✓&form-input="+aLine1+"%2C+"+aLine3+"+Ireland";
        //Switch to try catch if time
        Document doc = Jsoup.connect(url)
                .timeout(6000).get();

        //Scrape elements from relevant section

        Elements body = doc.select("well.col-md-4.h2");
        Element e = new Element("well.col-md-4.h2");
        constit = e.toString();
        

        return constit;

I am extremely new to JSoup and scraping in general.一般来说,我对 JSoup 和抓取非常陌生。 Would appreciate any input from someone who knows what they're doing or any alternate ways to try and get the desired result非常感谢知道自己在做什么的人的任何意见或尝试获得所需结果的任何替代方法

Change your scraping elements from relevant section code as follows:从相关部分代码中更改您的抓取元素,如下所示:

  • Select the very first <div class="well"> element first. Select 首先是第一个<div class="well">元素。

     Element tdsDiv = doc.select("div.well").first();
  • Select the very first <a> link element next. Select 接下来是第一个<a>链接元素。 This link points to the constituency.此链接指向选区。

     Element constLink = tdsDiv.select("a").first();
  • Get the constituency name by grabbing this link's text content.通过抓取此链接的文本内容获取选区名称。

     constit = constLink.text();
import org.junit.jupiter.api.Test;

import java.io.IOException;

@DisplayName("JSoup, how to return data from a dynamic <a href> tag")
class JsoupQuestionTest {
    private static final String URL = "https://www.whoismytd.com/search?utf8=%E2%9C%93&form-input=Kildare%20Street%2C%20Dublin%2C%20Ireland";
    @Test
    void findSomeText() throws IOException {
        String expected = "Dublin Bay South";
        Document document = Jsoup.connect(URL).get();
        String actual = document.getElementsByAttributeValue("href", "/constituency/dublin-bay-south").text();
        Assertions.assertEquals(expected, actual);

    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM