简体   繁体   English

JSOUP选择具有特定ID的<div>

[英]JSOUP select <div> with specific ID

I'm making a small Android application for a class where I find cancer-related events from the American Cancer Society's website. 我正在为一个班级制作一个小型Android应用程序,在那里我可以从美国癌症协会的网站上找到癌症相关事件。 I've been using JSoup to get basic information about the events, and to get specific information from the website I've tried to use the select() method. 我一直在使用JSoup来获取有关事件的基本信息,并从我尝试使用select()方法的网站获取特定信息。 However, the current method that I'm using grabs way more HTML nodes than I would like and I couldn't figure out why. 但是,我正在使用的当前方法抓取的方式比我想要的更多HTML节点,我无法弄清楚原因。 The table that I'm trying to grab looks like this: 我试图抓住的表看起来像这样:

EDIT: I realized that the where id = "pnlResults" does not end at that table, it ends after about 3 more tables, all with information that I would like to grab. 编辑:我意识到where id =“pnlResults”并没有在那个表结束,它在大约3个表之后结束,所有表都包含我想要获取的信息。 Here is the table again 这是表格

    <div id="pnlResults">

        <h2><span id="lblEventName">American Cancer Society 44th Annual Walter Hagen Golf Tournament</span></h2>
        <!-- General Information Box -->
        <div class="text-box boxed wide">
            <h3 class="head" style="width:97%;">
                General Information
            </h3>
            <div class="content">


                <p>
                    <label>Event Times:</label><span id="lblStartDate">Monday, July 30, 2012</span><span id="lblEndDate"></span><br />
                    <label>&nbsp;</label><span id="lblStartTime">10:00 AM</span> - <span id="lblEndTime">9:00 PM</span>
                </p>
                <p>
                    <label>Time Zone:</label><span id="lblTimeZone">Eastern</span>

                </p>
                <p>
                    <label>Description:</label><span id="lblDesc" class="fieldData long">The American Cancer Society Walter Hagen Golf Tournament highlights the Society’s role in supporting research and patient care here in Rochester. Funds raised through this event help us make a difference in patents’ lives every day though programs including Road to Recovery and Patient Navigation as well as support grants to our research institutions.  144 golfers will play a round of golf and then enjoy cocktails, dinner, and silent auction following the tournament. </span>
                </p>
                <p>
                    <label>Agenda:</label><span id="lblAgenda" class="fieldData long">10:00am - Check-in, 11:00am - Lunch, 12:15pm - Shot gun start, 6:00 - Cocktails and silent auction, 7:00pm Dinner and program</span>
                </p>

            </div>
        </div>

        <div id="pnlStandardDisplay">


        <!-- Event Location Box -->
        <div class="text-box boxed wide line">
            <h3 class="head" style="width:97%;">
                Event Location
            </h3>
            <div class="content" style="display:inline-block; width:97%;">


                <div >
                    <div id="mapOutsideContainer" class="resource-map">
                       <div id="map_canvas" class="resource-map" ></div>
                    </div> 
                    <script  type="text/javascript">
                        var mapDataPoints = [{ "lat":43.1075545,"lng":-77.5164518, "title":"Golf Event","content":"<b>American Cancer Society 44th Annual Walter Hagen Golf Tournament<\/b><br/><\/br>4045 East Avenue<br /><br/>Rochester, New York  14618<br /><br />Phone: <br />Fax: "} ];
                        buildMap(mapDataPoints, -5);
                    </script>
                </div>

                <h4><span id="lblLocationName">Irondequoit Country Club</span></h4>
                <p>

                    <label>Address:</label><span id="lblAddress" class="fieldData" style="width:150px;">4045 East Avenue<br />Rochester, New York 14618</span>
                </p>
                <p>
                    <label nowrap="nowrap">Handicap Accessible:</label><span id="lblHandicapAccesible">Yes</span>
                </p>
            </div>

        </div>

        <!-- Primary Contact Box -->
        <div class ="line" >
        <div id="eventPrimaryContact_divContact" class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Primary Contact
                    </h3>
                    <div class="content">

                        <p>

                            <label>Contact:</label><span id="eventPrimaryContact_lblContact">Katerina Kormas (<a href="mailto:katerina.kormas@cancer.org?subject=American Cancer Society 44th Annual Walter Hagen Golf Tournament">Contact ACS for Details</a>)</span>

                        </p>
                        <p>
                            <label>Contact Type:</label><span id="eventPrimaryContact_lblContactType">ACS Staff</span>
                        </p>
                        <p>

                            <label>Phone:</label><span id="eventPrimaryContact_lblContactPhone">(585) 288-1950</span>
                        </p>
                        <p>
                            <label>Additional Information:</label><span id="eventPrimaryContact_lblContactAddlInfo" class="fieldData long">Direct line is 585-224-4919 or cell 585-645-8912</span>
                        </p>
                    </div>
                </div>

        </div>

        <!-- Registration Information Box -->

        <div class="text-box boxed wide line">
            <h3 class="head" style="width:97%;">
                Registration Information
            </h3>
            <div class="content">

                <p>
                    <label nowrap="nowrap">Registration Required?: </label><span id="lblRegRequired">Yes</span>

                </p>
            </div>
        </div>       

        <!-- Event Cost Box -->
        <div class ="line" >
        <div id="eventCost_divCost" class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Event Cost
                    </h3>
                    <div class="content">

                        <p>
                            <label>Cost/Registration Fee: </label><span id="eventCost_lblCostRegFee" class="fieldData long">$350 per golfer</span>
                        </p>
                        <p>
                            <label>Payment Type: </label><span id="eventCost_lblPaymentTypes" class="fieldData">Cash, Check, American Express, Mastercard, Visa, Discover</span>
                        </p>
                        <p>

                            <label>Check Payable To: </label><span id="eventCost_lblCheckPayable" class="fieldData">American Cancer Society</span>
                        </p>
                        <p>
                            <label>Memo Line: </label><span id="eventCost_lblCheckMemo" class="fieldData">American Cancer Society 44th Annual Walter Hagen Golf Tourna</span>
                        </p>
                        <p>
                            <label>Mail Check To:</label><span id="eventCost_lblCheckMailTo" class="fieldData">American Cancer Society<br />1120 South Goodman St<br />Rochester, New York 14620</span>

                        </p>
                    </div>
                </div>

        </div>

        <!-- Tax Deduction Information Box -->
        <div class="line">

                <div class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Tax Deduction Information
                    </h3>

                    <div class="content">
                        <p>
                            $210  per golfer is tax deductible
                        </p>
                    </div>
                </div>  

        </div>



</div> <!-- end standard display -->
         <!-- end daffodil display -->

EDIT: Given these new tables, I would like to extract the General Information, and Event location. 编辑:鉴于这些新表,我想提取一般信息和事件位置。 How would I go about doing that? 我该怎么做呢? Maybe using the subset of select I just got to select again Where the headers are what I want? 也许使用select的子集我只需要再次选择哪里标题是我想要的?

The code where I'm using the select() is shown below. 我正在使用select()的代码如下所示。 As I said before, I tried to use 正如我之前所说,我试图使用

select("div[id=pnlResults]);

but the returned data is much more than just the div where the id is pnlResults. 但返回的数据不仅仅是id为pnlResults的div。

public ArrayList<Event> results()
{
    ArrayList<Event> results = new ArrayList<Event>();
    Document doc = Jsoup.parse(page);
    Elements links = doc.select("a[href*=event-details]");

    for(Element e: links)
    {
        String title = e.text();
        String link = "http://www.cancer.org/involved/participate/app/"+e.attr("href");
        try{
            Document eventInfo = Jsoup.connect(link).get();
            Elements info = eventInfo.select("div[id*=pnlResults");


        }
        catch(MalformedURLException exception)
        {
            exception.printStackTrace();
        }
        catch(IOException exception)
        {
            exception.printStackTrace();
        }

    }
    return results;
}

Any help would be greatly appreciated. 任何帮助将不胜感激。

Try: 尝试:

 Elements info = eventInfo.select("div#pnlResults");

Update for your update: 更新更新:

Since you now have more data, and since the HTML itself isn't that great you'll just have to work through it to pick out your data. 由于您现在拥有更多数据,并且由于HTML本身并不是那么好,您只需要通过它来挑选您的数据。 If the content you need all have id values then use the id attribute of those elements to get the text. 如果您需要的内容都具有id值,则使用这些元素的id属性来获取文本。

If you want to get content of the div with id "pnlResults", JSoup provide method getElementById . 如果你想获得id为“pnlResults”的div的内容,JSoup提供方法getElementById

For example, if you want get that content and put it in string, you can do it like this: 例如,如果您想获取该内容并将其放在字符串中,您可以这样做:

Document document = Jsoup.connect(LINK_TO_WEBSITE).get();
String content = document.getElementById("pnlResults").outerHtml();

Then, you can put this content in Android's WebView, and it will work nice. 然后,您可以将此内容放在Android的WebView中,它会很好用。

Hope this will help someone! 希望这会对某人有所帮助!

This worked for me: 这对我有用:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class DivStuff {
   public static final String MY_PAGE = "http://www.cancer.org/Involved/Participate/app" +
        "/event-search.aspx?zip=28590&city=&state=&local-radius=20&textsrch=&startdate=" +
        "11%2F13%2F2011&enddate=&all=1";
   private static final String[] HEADINGS = {"Event", "Location", "City, State", "Date", "Distance"};
   private String page;


   public static void main(String[] args) throws IOException {
      Document doc = Jsoup.connect(MY_PAGE).get();

      Elements links = doc.select("table");
      Elements links2 = links.select("tr");

      if (links2.size() < 2) {
         return;
      }

      for (int i = 1; i < links2.size(); i++) {
         Elements innerDetails = links2.get(i).select("td");
         if (innerDetails.size() != 5) {
            break;
         }
         for (int j = 0; j < HEADINGS.length; j++) {
            System.out.print(HEADINGS[j] + ": ");
            if (j == 0) {
               System.out.println(innerDetails.get(j).select("a").get(0).text());
            } else {
               System.out.println(innerDetails.get(j).text());
            }
         }
         System.out.println();
      }
   }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM