简体   繁体   中英

How do I make the response from Python's requests package be a “file-like object”

I am hitting a webservice with Python's requests library and the endpoint is returning a (very large) CSV file which I then want to stream into a database. The code looks like this:

response = requests.get(url, auth=auth, stream=True)
if response.status_code == 200:
    stream_csv_into_database(response)

Now when the database is a MongoDB database, the loading works perfectly using a DictReader :

def stream_csv_into_database(response):
    .
    .
    .
    for record in csv.DictReader(response.iter_lines(), delimiter='\t'):
        product_count += 1
        product = {k:v for (k,v) in record.iteritems() if v}
        product['_id'] = product_count
        collection.insert(product)

However, I am switching from MongoDB to Amazon RedShift, which I can already access just fine using psycopg2 . I can open connections and make simple queries just fine, but what I want to do is use my streamed response from the webservice and use psycopg2's copy_expert to load the RedShift table. Here is what I tried so far:

def stream_csv_into_database(response, campaign, config):
    print 'Loading product feed for {0}'.format(campaign)
    conn = new_redshift_connection(config) # My own helper, works fine.
    table = 'products.' + campaign
    cur = conn.cursor()
    reader = response.iter_lines()
    # Error on following line:
    cur.copy_expert("COPY {0} FROM STDIN WITH CSV HEADER DELIMITER '\t'".format(table), reader)
    conn.commit()
    cur.close()
    conn.close()

The error that I get is:

file must be a readable file-like object for COPY FROM; a writable file-like object for COPY TO.

I understand what the error is saying; in fact, I can see from the psycopg2 documentation that copy_expert calls copy_from , which:

Reads data from a file-like object appending them to a database table (COPY table FROM file syntax). The source file must have both read() and readline() method.

My problem is that I cannot find a way to make the response object be a file-like object! I tried both .data and .iter_lines without success. I certainly do not want to download the entire multi-gigabyte file from the webservice and then upload it to RedShift. There must be a way to use the streaming response as a file-like object that psycopg2 can copy into RedShift. Anyone know what I am missing?

You could use the response.raw file object , but take into account that any content encoding (such as GZIP or Deflate compression) will still be in place unless you set the decode_content flag to True when calling .read() , which psycopg2 will not.

You can set the flag on the raw file object to change the default to decompressing-while-reading:

response.raw.decode_content = True

and then use the response.raw file object to csv.DictReader() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM