简体   繁体   English

如何从HTML表格中提取单元格并基于Java中表格行中的其他单元格进行组织?

[英]How do I extract cells from HTML Tables and organize based on other cells in table row in Java?

I have the following HTML extracted from a website. 我从网站上提取了以下HTML。 I have all of this HTML stored as a String variable in Java and I want to be able to look at every Table Row and if there are any data cells with the words "Current Assignments Report" in that table then it would look at the other data cells in that table and add the course name to an ArrayList and also store the numbers in the href after the javascript:rlViewItm and add those numbers to another ArrayList. 我将所有这些HTML存储为Java中的String变量,我希望能够查看每个表行,并且如果该表中有任何带有“ Current Assignments Report”字样的数据单元,那么它将查看另一个表格中的数据单元格,将课程名称添加到ArrayList,并将数字存储在javascript:rlViewItm之后的href中,并将这些数字添加到另一个ArrayList。 Here is an example of that line: 这是该行的示例:

<a href="javascript:rlViewItm('2049144736880355316');">View</a>

I will provide an example to clear up what I'm trying to get. 我将提供一个示例来弄清我要获取的内容。 It would first begin looking the html below which is a String. 它首先将开始查找下面的HTML,它是一个字符串。 It would look at each Table and then each individual table row separately. 它将分别查看每个表,然后分别查看每个表行。 If there is a table row which has a table data cell that says "Current Assignment Report" then it would look at the other data cells in that table row and find the line written below with only the numbers being changed. 如果某个表行的表数据单元格显示“当前分配报告”,则它将查看该表行中的其他数据单元格,并在下面查找仅更改数字的行。 I want these numbers to be stored in a separate arrayList. 我希望这些数字存储在单独的arrayList中。

<a href="javascript:rlViewItm('2049145027227690148');">View</a>

I have worked with sorting strings in Java before but I don't understand how to store each thing separately into an ArrayList based on particular criteria of an HTML Table. 之前,我曾使用Java对字符串进行排序,但是我不了解如何根据HTML表的特定条件将每件事分别存储到ArrayList中。

I would greatly appreciate anyone's help who can do this in Java! 我非常感谢任何可以使用Java做到这一点的人的帮助!

  <div class="ed-formArea">
  <div class="ed-formHeader noText">
  </div>
  <div class="ed-formContent">
<!--SECTION CODE null Section #1  ENDS - DO NOT MODIFY -->
<!--SECTION CODE null CUSTOM CODE BEGIN -->


<form method="post" name="resourceLabelForm" action="/post/UserDocList.page">
<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td class="ed-tdEnd">
            Private Reports


                <small><small>&nbsp;(1-40 of 40&nbsp;items)</small></small>

        </td></tr>
</tbody>
</table>

 </form>

<form method="post" name="userDocListTableForm" action="/post/UserDocList.page">
  <input type="hidden" name="selectAllEvent" value="" />
  <input type="hidden" name="deselectAllEvent" value="" />
  <table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>


</tbody>
</table>



<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td>&nbsp;</td><td valign="bottom" width="12%">
          <div class="smaller"><strong>
            Report Date
          </strong></div>
        </td><td valign="bottom" width="8%">
          <div class="smaller"><strong>Report</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>View Home Page</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>Report Name</strong></div>
        </td><td valign="bottom" width="2%" class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027192329860');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027227690148');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027213095124');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027201539636');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/10/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027226480084');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027229871460');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027216196756');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049144831908197844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School"> 
      Local High School
      </a> 
    </td><td>

            Student Grades and Graduation Credit Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/07/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027196480420');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/02/14
        </td><td>
          <a href="javascript:rlViewItm('2049144736912474660');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Current Absences Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031942836');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031809620');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025439028');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936016776612');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936060013524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025100916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936022815204');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936043227972');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049145025811761220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 3 Absences as of Mar 31, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div>/td>
<td valign="center">&nbsp;</td><td>
          03/08/14
        </td><td>
          <a href="javascript:rlViewItm('2049144992192941348');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP3 as of Feb 28
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144934670566308');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 2 Absences as of Jan 24, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824058685812');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824085227764');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824074464628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824082665540');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824049900244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824039718948');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824065741444');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824083064244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          12/13/13
        </td><td>
          <a href="javascript:rlViewItm('2049144874776524020');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP2 as of Dec 06
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144822701443172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 1 Absences as of Nov 04, 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736860489172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736881890916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736862291156');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736866166628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736903239140');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736880355316');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736894413524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736870593220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          10/04/13
        </td><td>
          <a href="javascript:rlViewItm('2049144777895089844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP1 as of Sep 27
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    </tbody>
</table>

Disclaimer: Do not use regular expressions to parse HTML . 免责声明: 请勿使用正则表达式解析HTML


If the HTML is as strictly formatted as in your posted code, you can follow these steps: 如果HTML的格式与发布代码中的格式一样严格,则可以按照以下步骤操作:

Using the Pattern.DOTALL flag, search the entire string with 使用Pattern.DOTALL标志,使用

<tr>(.*?)<td> Current Assignments Report </td>.*?</tr>

Iterating each match with Matcher.find() , puts each assignment's data into capture group one. 使用Matcher.find()迭代每个匹配Matcher.find() ,将每个分配的数据放入捕获组1。 Example match: 匹配示例:

 <td>
  <div class="ed-tdSpacer"></div></td>
 <td valign="center">&nbsp;</td>
 <td> 04/02/14 </td>
 <td> <a href="javascript:rlViewItm('2049145027229871460');">View</a> </td>
 <td> <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a> </td>

In this text, search for each instance of <td> (.*?) </td> . 在此文本中,搜索<td> (.*?) </td>每个实例。 The contents of each data item is placed in its capture group one. 每个数据项的内容都放在捕获组1中。 Searching the above text results in these matches: 搜索以上文本将导致以下匹配:

04/02/14
<a href="javascript:rlViewItm('2049145027229871460');">View</a>
<a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a>

The date can be pretty much taken as is, and the other two items will need to be parsed based on what you want to get out of them. 日期几乎可以照原样保存,另外两个项目将需要根据您想从中得到的内容进行解析。

But again, if your input is really as strict as you imply, it shouldn't be that bad. 但同样,如果您的输入确实像您暗示的那样严格,那应该不是那么糟糕。


Updated: With your most recent input (the long file you posted), this regex captures each item, as best I understand your needs: 已更新:使用您的最新输入(您发布的长文件),此正则表达式将捕获每个项目,据我所知,您最了解它:

<td>\s*?Current Assignments Report.*?<td>\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?<a href="javascript:rlViewItm\('([0-9]+)'\);">View</a>.*?<a class="lochomepage" href="([^"]+)">\s*([\w ]+)\s*</a>

正则表达式可视化

Debuggex Demo Debuggex演示

Note this takes a while to load, because the input is so long. 请注意,由于输入时间太长,因此加载需要一段时间。

Capture groups: 捕获组:

  1. Date 日期
  2. Item number 项目编号
  3. The lochomepage url lochomepage网址
  4. The link display 链接显示

I know it's been a while since you asked this. 我知道你问这个问题已经有一段时间了。 Maybe it still helps... 也许它仍然有帮助...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM