HTTP Error 400: Bad Request (urllib)

Question

I'm writing a script to get information regarding buildings in NYC. I know that my code works and returns what i'd like it to. I was previously doing manual entry and it worked. Now i'm trying to have it read addresses from a text file and access the website with that information and i'm getting this error:

urllib.error.HTTPError: HTTP Error 400: Bad Request

I believe it has something to do with the website not liking lots of access from something that isn't a browser. I've heard something about User Agents but don't know how to use them. Here is my code:

from bs4 import BeautifulSoup
import urllib.request

f = open("FILE PATH GOES HERE")

def getBuilding(link):
    r = urllib.request.urlopen(link).read()
    soup = BeautifulSoup(r, "html.parser")
    print(soup.find("b",text="KEYWORDS IM SEARCHING FOR GO HERE:").find_next("td").text)


def main():
    for line in f:
        num, name = line.split(" ", 1)
        newName = name.replace(" ", "+")
        link = "LINK GOES HERE (constructed from num and newName variables)"
        getBuilding(link)      
    f.close()

if __name__ == "__main__":
    main()

Answer 1

A 400 error means that the server cannot understand your request (eg, malformed syntax). That said, its up to the developers on what status code they want to return and, unfortunately, not everyone strictly follows their intended meaning.

Check out this page for more details on HTTP Status Codes.

With regards on how to how to set a User Agent: A user agent is set in the request header and, basically, defines the client making the request. Here is a list of recognized User Agents . You will need to use urllib2 , rather than urllib , but urllib2 is also a built-in package. I will show you how update the getBuilding function to set the header using that module. But I would recommend checking out the requests library. I just find that to be super straight-forward and it is highly adopted/supported.

Python 2:

from urllib2 import Request, urlopen

def getBuilding(link):        
    q = Request(link)
    q.add_header('User-Agent', 'Mozilla/5.0')
    r = urlopen(q).read()
    soup = BeautifulSoup(r, "html.parser")
    print(soup.find("b",text="KEYWORDS IM SEARCHING FOR GO HERE:").find_next("td").text)

Python 3:

from urllib.request import Request, urlopen

def getBuilding(link):        
    q = Request(link)
    q.add_header('User-Agent', 'Mozilla/5.0')
    r = urlopen(q).read()
    soup = BeautifulSoup(r, "html.parser")
    print(soup.find("b",text="KEYWORDS IM SEARCHING FOR GO HERE:").find_next("td").text)

Note: The only difference between Python v2 and v3 is the import statement.

HTTP Error 400: Bad Request (urllib)

Question

1 answers

solution1
2 2016-06-18 20:25:44

HTTP Error 400: Bad Request (urllib)

Question

1 answers

solution1 2 2016-06-18 20:25:44

solution1
2 2016-06-18 20:25:44