[英]Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys
[英]Efficient way to create dictionary from pandas column with list entries where list elements are keys
我正在嘗試創建列表元素值到索引的映射。 例如,給定這樣的一個熊貓數據框:
>>> book_df
name genre
0 Harry Potter ["fantasy", "young adult"]
1 Lord of the Rings ["fantasy", "adventure", "classics"]
2 I, Robot ["science fiction", "classics"]
3 Animal Farm ["classics", "fantasy"]
4 A Monster Calls ["fantasy", "young adult"]
我想生成一個將該類型映射到該類型下的電影列表的字典。
所以,我想要得到的是這樣的:
>>> genre_to_book_map
{
"fantasy": ["Harry Potter", "Lord of the Rings", "Animal Farm", "A Monster Calls"],
"young adult": ["Harry Potter", "A Monster Calls"],
"classics": ["Lord of the Rings", "I, Robot", "Animal Farm"],
"science fiction": ["I, Robot"],
"adventure": ["Lord of the Rings"]
}
我已經設法通過分解列表然后從中創建字典來實現這一目標(基於列表的Pandas列,為每個列表元素和Pandas groupby創建兩行,然后按兩列獲取值) )像這樣:
exploded_genres = pd.DataFrame({
"name" :np.repeat(book_df["name"].values, book_df["genres"].str.len())
}).assign(**{"genres":np.concatenate(book_df["genres"].values)})
genre_to_name_map = exploded_genres.groupby("genres")["name"].apply(lambda x: x.tolist())
但我想知道是否有一種更有效的方法,因為這似乎是一件相對簡單的事情
使用舊的好collections.defaultdict
對象:
In [410]: from collections import defaultdict
In [411]: d = defaultdict(list)
In [412]: for idx, row in df.iterrows():
...: for g in row['genre']:
...: d[g].append(row['name'])
...:
In [413]: dict(d)
Out[413]:
{'fantasy': ['Harry Potter',
'Lord of the Rings',
'Animal Farm',
'A Monster Calls'],
'young adult': ['Harry Potter', 'A Monster Calls'],
'adventure': ['Lord of the Rings'],
'classics': ['Lord of the Rings', 'I, Robot', 'Animal Farm'],
'science fiction': ['I, Robot']}
從0.25
您可以使用explode
展開列表。
book_df.explode('genre').groupby('genre')['name'].apply(list).to_dict()
您需要將列表融為一體,然后按流派分組並輸出到字典。
import pandas as pd
df = pd.DataFrame({'name' : [
'Harry Potter',
'Lord of the Rings',
'I, Robot',
'Animal Farm',
'A Monster Calls'
],
'genre': [
["fantasy", "young adult"],
["fantasy", "adventure", "classics"],
["science fiction", "classics"],
["classics", "fantasy"],
["fantasy", "young adult"]
]
})
# create a Series object, give it a name.
s = df.genre.apply(pd.Series).stack().reset_index(level=-1, drop=True)
s.name = 'genres'
# merge and groubpy and output to dict.
d = (
df.loc[:,['name']]
.merge(s, left_index=True, right_index=True)
.groupby('genres')['name']
.apply(list)
.to_dict()
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.