简体   繁体   中英

How to Extract Data from tmdB using Python

I have a dataset from Movielens found here . My goal is to add to this dataset all the movie overviews for all the movie ids existed in the dataset (the dataset provides a movie id for tmdb and other databases).

I have studied the tmdb documentation for developers, but I have only reached to extract data for individual movies per time. My goal is to create a loop for all the movie ids contained in my dataframe, and get the "overview" of every respective movie.

The closest point I have reached is:

pip install tmdbv3api
from tmdbv3api import TMDb
from tmdbv3api import Movie
tmdb = TMDb()
tmdb.api_key = 'my API Key'
tmdb.language = 'en'
tmdb.debug = True

movie = Movie()
# for example the movie with id=862
m = movie.details(862)
print (m.overview)

This will give me the desirable result for an individual movie overview, but when I try to create a loop like the following, it totally fails. This is most likely wrong, but I don't even know if it is possible based on the tmdb API:

movie = Movie()
id = movie.details(int(movies.tmdbId))

for id in movies["tmdbId"]: # my dataframe
   if id in tmdb.Movies(int(tmdb_id)): # tmdb database
      print (m.overview)

I also know this can be done via JSON files and urllib.request, the closest I have seen is this , which again is with respect to one movie per time.

Excuse me for any mistakes you might have seen, I am new to this field.

Thank you in advance.

code snippet

@ibbs thank you: With your help I finally reached the following which seems to work:

movie = Movie()
for id in movies["tmdbId"]:
  try:
  m= movie.details(id)
  print(m.overview)
except:
# movie ids of the csv file no longer in tmdB
  pass 

I don't have enough rep to comment so I'll point this out here, it seems that you don't quite understand how for loops work. The id variable that you defined will be overwritten by the value of movies['tmdbId'] in the for loop.

As for a solution try this.

movie = Movie()
id = movie.details(int(movies.tmdbId)) # Not sure why this variables is defined?

for id in movies["tmdbId"]: # my dataframe, I am assuming this is iterable
   if movie.details(int(id)): # tmdb database,
      m = movie.details(int(id))
      print (m.overview)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM