[英]New DataFrame column using the key of a dictionary as row value when one of it's values is found in a given row
I have a Pandas DataFrame with a large number of unique values. 我有一个带有大量唯一值的Pandas DataFrame。 I would like to group these values with a more general column. 我想将这些值与更一般的列进行分组。 By doing so I expect to add hierarchies to my data and thus make analysis easier. 这样,我希望将层次结构添加到我的数据中,从而使分析更加容易。
One thing that worked was to copy the column and replaced the values as follows: 起作用的一件事是复制该列并替换值,如下所示:
data.loc[data['new_col'].str.contains('string0|string1'), 'new_col']\
= 'substitution'
However, I am trying to find a way to reproduce this easily without adding a condition for each entry. 但是,我试图找到一种轻松地重现此方法而不为每个条目添加条件的方法。
Also tried using without success using the following methods: 还尝试使用以下方法成功使用:
I would like to hear your advice to know how to approach this. 我想听听您的建议,以了解如何解决此问题。
import pandas as pd
# My DataFrame looks similar to this:
>>> df = pd.DataFrame({'A': ['a', 'w', 'c', 'd', 'z']})
# The dictionary were I store the generalization:
>>> subs = {'g1': ['a', 'b', 'c', 'd'],
... 'g2': ['w', 'x', 'y', 'z']}
>>> df
A H
0 a g1
1 w g2
2 c g1
3 d g1
4 z g2
create a new dict by swapping key with values of list. 通过将键与list值交换来创建新的字典。 Next, map df.A
with the swapped dict. 接下来,将df.A
与已交换的dict映射。
swap_dict = {x: k for k, v in d.items() for x in v}
Out[1054]:
{'a': 's1',
'b': 's1',
'c': 's1',
'd': 's1',
'w': 's2',
'x': 's2',
'y': 's2',
'z': 's2'}
df['H'] = df.A.map(swap_dict)
Out[1058]:
A H
0 a s1
1 w s2
2 c s1
3 d s1
4 z s2
Note : I directly use keys of your dict as values of H
instead of g1
, g2
,.... because I think it is enough to identify each group of values. 注意 :我直接将字典的键用作H
值,而不是g1
, g2
,....,因为我认为足以识别每组值。 If you still want g1
, g2
,..., it is easy to accomplish. 如果您仍然想要g1
, g2
,...,则很容易实现。 Just let me know. 请让我知道。
I also named your dict as d
in my code 我在代码中也将您的字典命名为d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.