简体   繁体   中英

Whats the fastest way to get movie info from omdb with python?

I have about 200 thousand imdb_id in a file, and want to get JSON information from these imdb_id using omdb API.

I wrote this code and it works correctly, but it's very slow (3 Seconds for each id, it would take 166 Hours):

import urllib.request
import csv
import datetime
from collections import defaultdict


i = 0
columns = defaultdict(list)
with open('a.csv', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        for (k, v) in row.items():
            columns[k].append(v)
with open('a.csv', 'r', encoding='utf-8') as csvinput:
    with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput:
        writer = csv.writer(csvoutput)
        for row in csv.reader(csvinput):
            if row[0] == "item_id":
                writer.writerow(row + ["movie_info"])
            else:
                url = urllib.request.urlopen(
                    "http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read()
                url = url.decode('utf-8')
                writer.writerow((row + [url]))
                i = i + 1

Whats the fastest way to get movie info from omdb with python ???

**Edited : I wrote this code and after get 1022 url resopnse i hava this error :

import grequests

urls = open("a.csv").readlines()
api_key = '??????'


def exception_handler(request, exception):
    print("Request failed")


# read file and put each lines to an LIST
for i in range(len(urls)):
    urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('\n') + "&apikey=" + api_key
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests, exception_handler=exception_handler)
with open('b.json', 'wb') as outfile:
    for response in responses:
        outfile.write(response.content)

Error is :

Traceback (most recent call last):
  File "C:/python_apps/omdb_async.py", line 18, in <module>
    outfile.write(response.content)
AttributeError: 'NoneType' object has no attribute 'content'

How can i solve this error ???

This code is IO bound and would benefit greatly from using Python's async/await capabilities. You can loop over your collection of URLs, creating an asynchronously executing request for each, much like the example in this SO question .

Once you're making these requests asynchronously, you may need to throttle your request rate to something within the OMDB API limit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM