[英]Mapping dictionary to pandas dataframe with lists
I have a dictionary where I would like to map it to a large pandas dataframe.我有一本字典,我想将它映射到一个大熊猫数据框。 The issue is the column I would like to use to map is wrapped in double quotes and sometimes there are one or more items in that column.问题是我想用来映射的列用双引号引起来,有时该列中有一个或多个项目。
original原来的
dict_id = {
'College1': ['1256511'],
'College2': ['1200582'],
'College3': ['1256618'],
'College10': ['1256621']
}
id1 id2 college_name
0 01 01 "College1, College2"
1 01 02 "College10, College12"
2 01 03 "College19"
desired想要的
id1 id2 college_name id_college
01 01 "College1, College2" 1256511, 1200582
01 02 "College10, College12" 1256621
01 03 "College19"
Your data is better formatted imo after explode
, but I put it all back to how it was at the end~你的数据在explode
之后更好地格式化了imo,但我把它全部恢复到最后的样子~
df.college_name = df.college_name.str[1:-1].str.split(', ')
df = df.explode('college_name')
df2 = pd.DataFrame.from_dict(dict_id, 'index', columns=['id_college'], dtype=str)
df = df.merge(df2, left_on='college_name', right_index=True, how='left')
df = df.fillna('').groupby(['id1', 'id2'], as_index=False).agg(', '.join)
df.college_name = '"' + df.college_name + '"'
print(df)
Output:输出:
id1 id2 college_name id_college
0 01 01 "College1, College2" 1256511, 1200582
1 01 02 "College10, College12" 1256621,
2 01 03 "College19"
let DF1 be your dictionary of college names and id's, and DF2 the massive dataframe with college name sometimes being a comma delimited list of college names让 DF1 成为您的大学名称和 id 字典,而 DF2 是带有大学名称的海量数据框,有时是用逗号分隔的大学名称列表
you're going to want to set the new column in DF2 according to a function that generates a series based on your DF1, and the DF2 collegeNames column您将要根据根据您的 DF1 和 DF2 collegeNames 列生成系列的函数在 DF2 中设置新列
def genIds(df, df_col):
id_list = []
for collegeName in df_col:
id_to_add = ""
if ',' in collegeName:
temp_list = []
for cName in collegeName.split(','):
if cName in df.keys():
# if this is an actual pandas df do
# if cName in df['college_names']
temp_list.append(df[cName])
id_to_add = ",".join(temp_list) if len(id_list)>0 else ""
else:
id_to_add = df[collegeName] if collegeName in df.keys() else ""
id_list.append(id_to_add)
return id_list
df2['ids'] = genIds(df1, df2['college_name'].values)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.