Just learning Python & Beautiful soup. Trying to scrape River Water Flows from a number of sites. Had success with most of the sites, but one in particular is giving me problems. The site is http://hydro.marlborough.govt.nz/reports/riverreport.html . I'm trying to get the 24th row of data from the main table.
The below seems to select the table but only returns the header & first row.
tMain_table = soup.select_one("table:nth-of-type(1)")
print (tMain_table)
<table class="table table-striped table-bordered table-hover">
<thead style="background-color: #4d4c4f;color: white;">
<tr>
<th class="text-center">Site Name</th>
<th class="text-center"><div>Date/Time </div>(NZST)</th>
<th class="text-center"><div>Flow</div>(m3/s)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
<th class="text-center"><div>Stage</div>(m)</th>
<th class="text-center"><div>Change</div>(mm/hr)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
<th class="text-center"><div>Peak</div>Date/Time</th>
</tr>
</thead>
<tbody>
<tr ng-repeat="item in data ">
<td nowrap="nowrap">{{item.SiteName}}</td>
<td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy HH:mm'}} </td>
<td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
<td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
<td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}}</td>
</tr>
</tbody>
</table>
Likewise the below also only returns the first row.
table = soup.findAll('tr')
print (table)
[<tr>
<th class="text-center">Site Name</th>
<th class="text-center"><div>Date/Time </div>(NZST)</th>
<th class="text-center"><div>Flow</div>(m3/s)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day</div>Peak Flow</th>
<th class="text-center"><div>Stage</div>(m)</th>
<th class="text-center"><div>Change</div>(mm/hr)</th>
<th class="text-center" nowrap="nowrap"><div>7 Day </div>Peak Stage</th>
<th class="text-center"><div>Peak</div>Date/Time</th>
</tr>, <tr ng-repeat="item in data ">
<td nowrap="nowrap">{{item.SiteName}}</td>
<td class="text-center" nowrap="nowrap">{{item.LastUpdate | asDate | date:'d MMM yy HH:mm'}} </td>
<td class="text-center" nowrap="nowrap">{{item.Flow}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakFlow}}</td>
<td class="text-center" nowrap="nowrap">{{item.Stage}}</td>
<td class="text-center" nowrap="nowrap">{{item.StageChange}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStage}}</td>
<td class="text-center" nowrap="nowrap">{{item.PeakStageDate | asDate | date:'d MMM yy HH:mm'}}
</td>
</tr>]
Any help appreciated
A webpage with dynamic content can be rendered, for example , with Selenium
A minimal example
from bs4 import BeautifulSoup
from selenium import webdriver
url = #
with webdriver.Firefox() as driver: # there are other drivers available
driver = webdriver.Firefox()
# driver.implicitly_wait(10)
rendered_wpage = driver.get(url).page_source
soup = BeautifulSoup(rendered_wpage, 'lxml')
# scrape here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.