[英]How to transform dictionary keys into a dataframe column based on the values if the values are lists
I have a dictionary of where the keys are numbers and the values are lists of strings.我有一个字典,其中键是数字,值是字符串列表。 I want to create a dataframe column where the column values are the dictionary keys and the key is selected base on matching the value of another column in each row to an item in the dictionary value lists.
我想创建一个 dataframe 列,其中列值是字典键,并且根据将每行中另一列的值与字典值列表中的项目匹配来选择键。 See example code below: Sample starting dataframe and dictionary:
请参阅下面的示例代码:示例开始 dataframe 和字典:
dict_x = {1:[a], 2:[b, c, e], 3:[d, f]
df = ['ID':[a, b, c, d, e, f]]
Desired output:所需的 output:
df = ['ID':[a, b, c, d, e, f], 'Number':[1, 2, 2, 3, 2, 3]]
I thought some sort of df['Number'] = df['ID'].apply(lambda x:???)
would work but I'm struggling with the conditions here, and I tried writing some for loops but ran in to issues with only the last iteration of the loop being preserved when I wrote the column.我认为某种
df['Number'] = df['ID'].apply(lambda x:???)
会起作用,但我在这里遇到了条件,我尝试编写一些 for 循环但跑进去了当我写专栏时,只保留循环的最后一次迭代的问题。
Simply invert the dictionary dict_x
by switching the role of key and value (loop over list elements to do that).只需通过切换键和值的角色来反转字典
dict_x
(循环列表元素来做到这一点)。
# setup dictionary properly
dict_x = {1:['a'], 2:['b', 'c', 'e'], 3:['d', 'f']}
df = pd.DataFrame({'ID':['a', 'b', 'c', 'd', 'e', 'f']})
# reverse dictionary
rev_dict_x = dict()
for k,v in dict_x.items():
for v_elem in v:
rev_dict_x[v_elem] = k
# replace elements
df['Number'] = df['ID'].replace(rev_dict_x)
>df
Note, that this assumes that the elements in the lists are unique, respectively.请注意,这假定列表中的元素分别是唯一的。 Otherwise, setting up the
rev_dict_x
will overwrite the value to those keys.否则,设置
rev_dict_x
将覆盖这些键的值。
I hope I've understood you correctly:我希望我对您的理解正确:
df = pd.DataFrame(
[(k, i) for k, v in dict_x.items() for i in v], columns=["Number", "ID"]
)
print(df)
Prints:印刷:
Number ID
0 1 a
1 2 b
2 2 c
3 2 e
4 3 d
5 3 f
Or:或者:
df = (
pd.DataFrame([dict_x])
.melt()
.explode("value")
.rename(columns={"variable": "Number", "value": "ID"})
)
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.