简体   繁体   中英

Trouble using requests library in python

I am attempting to check for active web site folders against a list that was created using robots.txt (this is for learning security, Im doing this on a server that I own and control). I am using Python 2.7 on Kali Linux.

My code works if I just do one web address at a time, as I get a proper 200 or 404 response for folders that are active and not working, respectively.

When I attempt to this against the entire list, I get a string of 404 errors. When i print out actual addresses that the script is creating, everything looks correct.

Here is the code that I am doing:

import requests
attempt = open('info.txt', 'r')
folders = attempt.readlines()

for line in folders:
    host = 'http://10.0.1.66/mutillidae'+line
    attempt = requests.get(host)
    print attempt

This results in a string of 404 errors. If I take the loop out, and try each one individually, I get a 200 response back showing that it is up and running.

I have also printed out the address using the same loop against the text document that contains the correct folders, and the addresses seem to look fine which I verified through copy and pasting. I have tried this with a file containing multiple folders and a single folder listed, and always get a 404 when attempting to read from the file.

The info.txt file contains the following:

/passwords/
/classes/
/javascript/
/config
/owasp-esapi-php/
/documentation/

Any advice is appreciated.

Lines returned by file.readlines() contain trailing newlines, which you must remove before passing them to requests.get . Replace the statement:

host = 'http://10.0.1.66/mutillidae'+line

with:

host = 'http://10.0.1.66/mutillidae' + line.rstrip()

and the problem will go away.

Note that your code would be easier to read if you refrained from using the same generic variable name such as attempt for different purposes, all in the same scope. Also, one should try to use variable names that reflect their usage—for example, host would be better named url , as it doesn't hold the host name, but the entire URL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM