简体   繁体   中英

How to extract the following lines after pattern match

the web source is like this:

<div class="MT12">
    <table class="tblchart" border="0" cellspacing="0" cellpadding="0">
        <tr>
            <th rowspan="2" width="100" align="left" valign="top">Date</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Open</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">High</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Low</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Close</th>
            <th colspan="2" style="text-align:center;" valign="top">- SPREAD -</th>
        </tr>
        <tr>
            <th width="100" style="text-align:right;" valign="top">(High-Low)</th>
            <th width="100" style="text-align:right;" valign="top" class="last">(Open-Close)</th>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-24</td>
            <td valign="top" style="text-align:right;">12269.25</td>
            <td valign="top" class="b_12vv" style="text-align:right">12283.70</td>
            <td valign="top" style="text-align:right;">12202.10</td>
            <td valign="top" style="text-align:right;">12214.55</td>
            <td valign="top" style="text-align:right;">81.60</td>
            <td align="right" valign="top" class="last" style="text-align:right;">54.70</td>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-23</td>
            <td valign="top" style="text-align:right;">12235.45</td>
            <td valign="top" class="b_12vv" style="text-align:right">12287.15</td>
            <td valign="top" style="text-align:right;">12213.25</td>
            <td valign="top" style="text-align:right;">12262.75</td>
            <td valign="top" style="text-align:right;">73.90</td>
            <td align="right" valign="top" class="last" style="text-align:right;">-27.30</td>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-20</td>
            <td valign="top" style="text-align:right;">12266.45</td>
            <td valign="top" class="b_12vv" style="text-align:right">12293.90</td>
            <td valign="top" style="text-align:right;">12252.75</td>
            <td valign="top" style="text-align:right;">12271.80</td>
            <td valign="top" style="text-align:right;">41.15</td>
            <td align="right" valign="top" class="last" style="text-align:right;">-5.35</td>
        </tr>
    </table>
</div>

I want to get the following numbers for every date: say for example I have to get the numbers 12269.25, 12283.70, 12202.10 and 12214.55 for a particular date (2019-12-24). Then proceed for the next date given.

I am facing difficulty because I need to select next 4 lines(whose xpath is not exatly related much as shown above) following each date in the page. The dates can range from single date to 100-200 dates.

Can anybody please help with webdriver code snippet for the same.

Thanks a lot

Can this meet your needs

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''<div class="MT12">
    <table class="tblchart" border="0" cellspacing="0" cellpadding="0">
        <tr>
            <th rowspan="2" width="100" align="left" valign="top">Date</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Open</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">High</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Low</th>
            <th rowspan="2" width="100" style="text-align:right;" valign="top">Close</th>
            <th colspan="2" style="text-align:center;" valign="top">- SPREAD -</th>
        </tr>
        <tr>
            <th width="100" style="text-align:right;" valign="top">(High-Low)</th>
            <th width="100" style="text-align:right;" valign="top" class="last">(Open-Close)</th>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-24</td>
            <td valign="top" style="text-align:right;">12269.25</td>
            <td valign="top" class="b_12vv" style="text-align:right">12283.70</td>
            <td valign="top" style="text-align:right;">12202.10</td>
            <td valign="top" style="text-align:right;">12214.55</td>
            <td valign="top" style="text-align:right;">81.60</td>
            <td align="right" valign="top" class="last" style="text-align:right;">54.70</td>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-23</td>
            <td valign="top" style="text-align:right;">12235.45</td>
            <td valign="top" class="b_12vv" style="text-align:right">12287.15</td>
            <td valign="top" style="text-align:right;">12213.25</td>
            <td valign="top" style="text-align:right;">12262.75</td>
            <td valign="top" style="text-align:right;">73.90</td>
            <td align="right" valign="top" class="last" style="text-align:right;">-27.30</td>
        </tr>
        <tr>
            <td align="left" valign="top">2019-12-20</td>
            <td valign="top" style="text-align:right;">12266.45</td>
            <td valign="top" class="b_12vv" style="text-align:right">12293.90</td>
            <td valign="top" style="text-align:right;">12252.75</td>
            <td valign="top" style="text-align:right;">12271.80</td>
            <td valign="top" style="text-align:right;">41.15</td>
            <td align="right" valign="top" class="last" style="text-align:right;">-5.35</td>
        </tr>
    </table>
</div>'''
doc = SimplifiedDoc(html)
table = doc.getElement(tag='table',value='tblchart')
trs = table.trs.notContains('<th') # get tr
for tr in trs:
  tds = tr.tds # get all td
  data = [td.text for td in tds]
  print (data[0],data[1],data[2],data[3],data[4])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM