简体   繁体   中英

“Do like readlines()” to a Session object (Python)

I want to pick out a few lines of information from some webpages. I want (or I am) to open the web pages, iterate down through the lines, checking each for a keyword, copying the information I want when I find it.

The pages require a session.

def getpage()
    home = 'website'
    exstension1 = '/input/page'
    extension2 = '/output/page'
    indexnumber = '11100'

    sess = requests.Session()
    getter = sess.get(home+extension1)
    payload = {'foo':'bar','indexnumber':indexnumber}
    getter = sess.post(home+extension2,data=payload)

    return sess

As I tried to say in the title, I need a readlines() method for a .get()

a.get(somePage)###Now could I put...###.readlines()
####or
a.get(somePage).text.readlines()###?
###I don't think I want the following, for performance reasons, correct me if I am wrong
F = open(someNewFile,mode='w')
F.write(a.get(somePage).text)
F.close()
F = open(thatFileIJustMade).readlines()###All that just to turn it into a File on which I can use readlines?

thanks

When I try

a.get(somePage).readlines()

I get

AttributeError: Response Object Doesn't have attribute readlines

There are a few ways you can do this, but the most Requests-y way is to use a streaming request along with Response.iter_lines() :

r = requests.get(somePage, stream=True)

for line in r.iter_lines(1024):
    # Do stuff on this line.

Beyond @Lukasa's excellent and entirely correct way of doing this, you could also do:

import io

r = requests.get(some_page)
file_like_obj = io.StringIO(r.text)
lines = file_like_obj.readlines()

Note that r.text is absolutely the right attribute to use on the Response object because on Python2 it will require unicode and on Python 3 a native string (which is unicode by default).

From the docs, keep in mind:

Warning

iter_lines() is not reentrant safe. Calling this method multiple times >causes some of the received data being lost. In case you need to call it >from multiple places, use the resulting iterator object instead:

lines = r.iter_lines()
# Save the first line for later or just skip it

first_line = next(lines)

for line in lines:
    print(line)

For the sake of simplicity, I use the following approach:

r = requests.get(somePage).text
r_lines = r.split("\n")

for line in r_lines:
    #line logic goes here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM