Convert a dictionary with lists of differing lengths in dictionary to pandas DataFrame

Question

I'm retrieving a JSON by calling the OMDB API about a movie. I'm trying to add the JSON to another dictionary which is scraping information from here .

Dict which has the scraped information has the structure:

{
         'movie_title': [],
         'review_text': [],
         'review_url': [],
         'reviewed_by': [],
         'score': []
}

I'm dynamically adding keys to the dictionary with values as empty lists by looping over the response from the OMDB API, like so

api_key = ''
ombd_data = requests.get('http://www.omdbapi.com/?apikey=api_key'+'&t=Basmati+Blues&plot=full'
omdb_json = json.loads(omdb_data).content
for curr_key in omdb_json.keys():
    movie_review_dict[curr_key] = []

The dict now has the structure

{
     u'Actors': [],
     u'Awards': [],
     u'BoxOffice': [],
     u'Country': [],
     u'DVD': [],
     u'Director': [],
     u'Genre': [],
     u'Language': [],
     u'Metascore': [],
     u'Plot': [],
     u'Poster': [],
     u'Production': [],
     u'Rated': [],
     u'Ratings': [],
     u'Released': [],
     u'Response': [],
     u'Runtime': [],
     u'Title': [],
     u'Type': [],
     u'Website': [],
     u'Writer': [],
     u'Year': [],
     u'imdbID': [],
     u'imdbRating': [],
     u'imdbVotes': [],
     'movie_title': [],
     'review_text': [],
     'review_url': [],
     'reviewed_by': [],
     'score': []
}

I have a function which reads this URL, uses the BeautifulSoup module and adds elements to the dict. I'm also adding data from the OMBD response at the same time.

def read_html_page(home_page='http://www.rogerebert.com/reviews'):
    movie_details = movie_review_dict
    result = requests.get(url=home_page)
    soup_obj = BeautifulSoup(result_content, 'html5lib')
    wrapper_class = soup_obj.find('div', id='review-list')
    for curr_movie_dom in wrapper_class.find_all('figure'):
        movie_title = curr_movie_dom.find('h5', class_='title').a.get_text()
        movie_critic = curr_movie_dom.find('p', class_='byline').get_text().strip()  
        omdb_dict = get_omdb_data(movie_title=movie_title)
        for curr_key in omdb_dict.keys():
            if curr_key in movie_details:
                movie_details[curr_key].append(omdb_dict[curr_key])
            else:
                movie_details[curr_key] = []
                movie_details[curr_key].append(omdb_dict[curr_key])
    return movie_details

I'm trying to store the dict into a pandas DataFrame, but I'm getting the error

ValueError('arrays must all be same length')

That's because some attributes from the OMDB response, like 'Languages', 'Website' exist for some movies, and not for others.

I've tried

movie_df = pd.DataFrame(movie_review_dict)
movie_df = pd.DataFrame.from_dict(movie_details)

And am running into the same Error.

Answer 1

You can try appending to an empty dataframe using pandas.DataFrame.append

df = pd.DataFrame()
df = df.append(movie_review_dict, ignore_index=False)

Convert a dictionary with lists of differing lengths in dictionary to pandas DataFrame

Question

1 answers

solution1
1 ACCPTED 2018-02-15 07:30:08

Convert a dictionary with lists of differing lengths in dictionary to pandas DataFrame

Question

1 answers

solution1 1 ACCPTED 2018-02-15 07:30:08

solution1
1 ACCPTED 2018-02-15 07:30:08