简体   繁体   中英

Convert a complex dictionary to a Pandas dataframe in Python

I've got a dictionary structured like this:
dict={'series_1':['id_series',[['season_1',[['ep1_title','ep_url'],['ep2_title','ep_url']…],['season_2',[['ep1_title','ep_url'],['ep2_title','ep_url']…],…]],'series_2':['id_series',[['season_1',[['ep1_title','ep_url'],['ep2_title','ep_url']…],['season_2',[['ep1_title','ep_url'],['ep2_title','ep_url']…],…]],…}

Here's a sample:

{'Scooby-Doo! Mystery Incorporated': ['1660055', [['season 1', [['Pawn of Shadows', 'https://dl.opensubtitles.org/it/download/sub/4797725'], ['All Fear the Freak', 'https://dl.opensubtitles.org/it/download/sub/4797755']]], ['season 2', [['Through the Curtain', 'https://dl.opensubtitles.org/it/download/sub/5465599'], ['Come Undone', 'https://dl.opensubtitles.org/it/download/sub/5465681']]]]], 'Scooby e Scrappy Doo': ['0084970', [['season 1', [["Scooby Roo/Scooby's Gold Medal Gambit", 'https://dl.opensubtitles.org/it/download/sub/6086643'], ['The Mark of Scooby/The Crazy Carnival Caper', 'https://dl.opensubtitles.org/it/download/sub/6086649']]]]]}

and i want to store this data in a Pandas dataframe built like this:

series_title    id_series   #season   ep_title      ep_url
series_1        #           1         title_1       #
series_1        #           1         title_2       #
series_1        #           2         title_1       #
series_2        #           1         title_1       #
series_2        #           2         title_1       #
series_2        #           2         title_2       #

etc.

I tried to apply solutions found in other questions (like this Construct pandas DataFrame from items in nested dictionary ) but they are too different and I didn't manage to reach my goal. Can anybody help me? Thanks

The season_id as the first element of a list whose second element is nested is going to make the simple automatic loading approaches difficult in this case. I would recommend just opening up the complex dict and creating a record list.

records = []
for series_name, seasons in d.items():
    series_id = seasons[0]
    for season_name, season_url, episode_list in seasons[1]:
        for episode_name, episode_url in episode_list:
            records.append([series_name, series_id, season_name, season_url, episode_name, episode_url])
df = pd.DataFrame.from_records(records, columns=["series_title", "series_id", "season_number", "season_url", "ep_title", "ep_url"])

To make the exact format, without "season_url" and with "season_number" as an int:

records = []
for series_name, seasons in d.items():
    series_id = seasons[0]
    for season_name, season_url, episode_list in seasons[1]:
        season_number = int(season_name.strip()[-1])
        for episode_name, episode_url in episode_list:
            records.append([series_name, series_id, season_number, episode_name, episode_url])
df = pd.DataFrame.from_records(
    records, columns=["series_title", "id_series", "season", "ep_title", "ep_url"]
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM