简体   繁体   中英

Scraping table with beautiful soup and python

I would like to retrieve tr from the nested table #timeTable from this webpage .

I've tried the following but it gives an empty array.

nlg_timetable_url = "https://navlib.forth-crs.gr/italian_b2c/npgres.exe?func=TT&ReservationType=npgres.exe%3FPM%3DBO&Leg1i=PRJ&Leg1ii=BEV&Leg1Date=26%2F02%2F2019&TotalPassengers=1&TotalPassengersHuman=1&TotalPassengersAcce=0&TotalVehicles=0"
headers = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'}
request = urllib.request.Request(nlg_timetable_url,headers=headers)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
ngl_timetable_table = list(soup.select('#timeTable tr'))
print(ngl_timetable_table)

Output

[]

I would use requests module

import requests
from bs4 import BeautifulSoup
nlg_timetable_url = "https://navlib.forth-crs.gr/italian_b2c/npgres.exe?func=TT&ReservationType=npgres.exe%3FPM%3DBO&Leg1i=PRJ&Leg1ii=BEV&Leg1Date=26%2F02%2F2019&TotalPassengers=1&TotalPassengersHuman=1&TotalPassengersAcce=0&TotalVehicles=0"
headers = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'}
res = requests.get(nlg_timetable_url,headers=headers)
soup = BeautifulSoup(res.content,'html.parser')
for item in soup.select('#timeTable tr'):
    print(item.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM