I've got a dictionary structured like this:
dict={'series_1':['id_series',[['season_1',[['ep1_title','ep_url'],['ep2_title','ep_url']…],['season_2',[['ep1_title','ep_url'],['ep2_title','ep_url']…],…]],'series_2':['id_series',[['season_1',[['ep1_title','ep_url'],['ep2_title','ep_url']…],['season_2',[['ep1_title','ep_url'],['ep2_title','ep_url']…],…]],…}
Here's a sample:
{'Scooby-Doo! Mystery Incorporated': ['1660055', [['season 1', [['Pawn of Shadows', 'https://dl.opensubtitles.org/it/download/sub/4797725'], ['All Fear the Freak', 'https://dl.opensubtitles.org/it/download/sub/4797755']]], ['season 2', [['Through the Curtain', 'https://dl.opensubtitles.org/it/download/sub/5465599'], ['Come Undone', 'https://dl.opensubtitles.org/it/download/sub/5465681']]]]], 'Scooby e Scrappy Doo': ['0084970', [['season 1', [["Scooby Roo/Scooby's Gold Medal Gambit", 'https://dl.opensubtitles.org/it/download/sub/6086643'], ['The Mark of Scooby/The Crazy Carnival Caper', 'https://dl.opensubtitles.org/it/download/sub/6086649']]]]]}
and i want to store this data in a Pandas dataframe built like this:
series_title id_series #season ep_title ep_url
series_1 # 1 title_1 #
series_1 # 1 title_2 #
series_1 # 2 title_1 #
series_2 # 1 title_1 #
series_2 # 2 title_1 #
series_2 # 2 title_2 #
etc.
I tried to apply solutions found in other questions (like this Construct pandas DataFrame from items in nested dictionary ) but they are too different and I didn't manage to reach my goal. Can anybody help me? Thanks
The season_id
as the first element of a list whose second element is nested is going to make the simple automatic loading approaches difficult in this case. I would recommend just opening up the complex dict and creating a record list.
records = []
for series_name, seasons in d.items():
series_id = seasons[0]
for season_name, season_url, episode_list in seasons[1]:
for episode_name, episode_url in episode_list:
records.append([series_name, series_id, season_name, season_url, episode_name, episode_url])
df = pd.DataFrame.from_records(records, columns=["series_title", "series_id", "season_number", "season_url", "ep_title", "ep_url"])
To make the exact format, without "season_url" and with "season_number" as an int:
records = []
for series_name, seasons in d.items():
series_id = seasons[0]
for season_name, season_url, episode_list in seasons[1]:
season_number = int(season_name.strip()[-1])
for episode_name, episode_url in episode_list:
records.append([series_name, series_id, season_number, episode_name, episode_url])
df = pd.DataFrame.from_records(
records, columns=["series_title", "id_series", "season", "ep_title", "ep_url"]
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.