简体   繁体   中英

“HTTP Error 500: Internal Server Error” -Web scraping

Solved using the answer from QHarr!

Trying to extract some information (starting with the title) from a website. The code below works fine with http://google.com , but not with the link i need ( url ).

Error code: "HTTP Error 500: Internal Server Error"

Am I doing something wrong? Is it possible to do this another way?

from urllib.request import urlopen
import urllib.error
import bs4
import time

url = "http://st.atb.no/New/minskjerm/FST.aspx?visMode=1&cTit=&c1=1&s1=16011301&sv1=&cn1=&template=2&cmhb=FF6600&cmhc=00FF00&cshb=3366FF&cshc=FFFFFF&arb=000000&rows=1&period=&" 


for i in range(5): #Try 5 times to reach page
    try: 
     html = urlopen(url)
    except urllib.error.HTTPError as exc:
        print('Error code: ', exc)
        time.sleep(1) # wait 10 seconds and then make http request again
        continue
    else:
        print('Success')
        break


soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.find('title')
print(title.getText()) 


Hey jacobara i think its something wrong with the site.U can still read the response with this

for i in range(5): #Try 5 times to reach page
     try: 
     html = urlopen(url)
     except urllib.error.HTTPError as exc:
        print('Error code: ', exc)
        content = exc.read()
        print(content)
        time.sleep(1) # wait 10 seconds and then make http request again
        continue
    else:
        print('Success')
        break

The page makes a POST request you can mimic direct

import requests
from bs4 import BeautifulSoup as bs

body = {"terminal": "1,16011301,," , "rows": 1,"visMode": 1}
r = requests.post('http://st.atb.no/New/minskjerm/DataHandler.ashx?type=departureTimes&lang=no', data = body)
soup = bs(r.content, 'lxml')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM