简体   繁体   中英

Python Beautiful Soup. only returning 1st row of table

Just learning Python & Beautiful soup. Trying to scrape River Water Flows from a number of sites. Had success with most of the sites, but one in particular is giving me problems. The site is http://hydro.marlborough.govt.nz/reports/riverreport.html . I'm trying to get the 24th row of data from the main table.

The below seems to select the table but only returns the header & first row.

tMain_table = soup.select_one("table:nth-of-type(1)")

print (tMain_table)
    <table class="table table-striped table-bordered table-hover">
    <thead style="background-color: #4d4c4f;color: white;">
    <tr>
    <th class="text-center">Site Name</th>
    <th class="text-center"><div>Date/Time </div>(NZST)</th>
    <th class="text-center"><div>Flow</div>(m3/s)</th>
    <th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
    <th class="text-center"><div>Stage</div>(m)</th>
    <th class="text-center"><div>Change</div>(mm/hr)</th>
    <th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
    <th class="text-center"><div>Peak</div>Date/Time</th>
    </tr>
    </thead>
    <tbody>
    <tr ng-repeat="item in data ">
    <td nowrap="nowrap">{{item.SiteName}}</td>
    <td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy         HH:mm'}} </td>
    <td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
    <td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
    <td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}}</td>
    </tr>
    </tbody>
    </table>

Likewise the below also only returns the first row.

table = soup.findAll('tr')

print (table)
    [<tr>
    <th class="text-center">Site Name</th>
    <th class="text-center"><div>Date/Time </div>(NZST)</th>
    <th class="text-center"><div>Flow</div>(m3/s)</th>
    <th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
    <th class="text-center"><div>Stage</div>(m)</th>
    <th class="text-center"><div>Change</div>(mm/hr)</th>
    <th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
    <th class="text-center"><div>Peak</div>Date/Time</th>
    </tr>, <tr ng-repeat="item in data ">
    <td nowrap="nowrap">{{item.SiteName}}</td>
    <td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy HH:mm'}} </td>
    <td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
    <td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
    <td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
    <td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}} 
    </td>
    </tr>]

Any help appreciated

A webpage with dynamic content can be rendered, for example , with Selenium

A minimal example

from bs4 import BeautifulSoup
from selenium import webdriver

url = # 

with webdriver.Firefox() as driver: # there are other drivers available
   driver = webdriver.Firefox()
   # driver.implicitly_wait(10)

   rendered_wpage = driver.get(url).page_source
   
soup = BeautifulSoup(rendered_wpage, 'lxml')
# scrape here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM