I am trying to scrape a table in python 2.7 using Beautiful Soup and/or Selenium (no pandas, lxml). Specific columns from the table need to be written to a csv file. I have looked to most of the similar questions( 12548793 , 30734963, 33448974 , 32434378 and more) but nothing worked for me so far. Obviously, this is my first attempt to scrape anything, so I don't even pretend that I understand half of what I am doing.
The code below works somewhat:
import urllib2
import bs4
from bs4 import BeautifulSoup
import csv
url = "http://data.dnr.nebraska.gov/RealTime/Gage/Index?StationSource=1&StationType=3&RiverBasin="
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page, "html.parser")
#get table headers for the columns of interest
#Data of interest:['Station_Name', 'Station_number', 'Date_time', 'Stage', 'Discharge'])
table1 = soup.find("table", id="StationNames")
ths = table1.findAll('th')
headers = (ths[0].text, ths[1].text, ths[2].text, ths[3].text, ths[4].text)
#print headers
#get measurements
table = soup.find_all('table', {"class":"btn-NDNR BlueUnderline"})
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
ncontent =(tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text)
#print ncontent
#write the csv file
with open('E:/test/nebraska.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(headers)
writer.writerow(ncontent)
#writer.writerow([value.get_text(strip=True).encode("utf-8") for value in ncontent])
Except that the csv table is empty, and while I print, this is what I am getting:
(u'\r\n Station Name\r\n ', u'\r\n Station Number\r\n ', u'\r\n Date Time (UTC)\r\n ', u'\r\n Stage\r\n ', u'\r\n Discharge\r\n ')
(u'\nBig Blue River at Beatrice - NDNR ', u'\r\n 6881500\r\n ', u'\r\n 01/05/2016 14:45 \r\n ', u'\r\n 4.27\r\n ', u'\r\n 524.62\r\n ')
Also, is there a more efficient and faster way of doing this?
Thank you in advance - any help will be greatly appreciated.
Several errors:
tds[0].text.strip()
ncontent
variable was rewritten durning the loop. Fix the errors and you will be good to go.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.