[英]How to get columns containing names of pre-defined equivalence classes of values in each row of a Pandas dataframe?
# import package
import pandas as pd
I have a dataframe:我有一个 dataframe:
data = {'row1': ['a', 'A', 'B', 'b'],
'row2': ['a', 'b', 'c', 'd'],
'row3': ['a', 'b', 'd', 'D']}
df = pd.DataFrame.from_dict(data, orient='index', columns=['col'+str(x) for x in range(4)])
which looks like:看起来像:
I also have a list of equivalence classes.我还有一个等价类列表。 Each equivalence class consists of items which are taken as equivalent.每个等价物 class 由被视为等价物的项目组成。
equivalenceClasses={'classA':['a','A'],
'classB':['b','B'],
'classC':['c','C'],
'classD':['d','D']}
I would like to create a dataframe in which the rows in the above dataframe are replaced by the names of the equivalence classes the letters in the row belong to .我想创建一个 dataframe,其中上面 dataframe 中的行被行中字母所属的等价类的名称替换。 (Each equivalence class should appear no more than once in a row, and we should use NaN
s to post-pad rows in which not all columns are fille by a name of an equivalence class). (每个等价 class 应该在一行中出现不超过一次,我们应该使用NaN
来后填充行,其中并非所有列都由等价类的名称填充)。 Ie I want this output:即我想要这个 output:
I achieve the goal by:我通过以下方式实现目标:
def differentClasses(colvalues):
return list(set([equivalenceClassName for colvalue in colvalues
for equivalenceClassName, equivalenceClass in zip(equivalenceClasses.keys(),
equivalenceClasses.values())
if colvalue in equivalenceClass]))
( On list comprehension , on nested list comprehension .) ( 关于列表理解, 关于嵌套列表理解。)
df['classes'] = df.apply(lambda row : differentClasses(row['col'+str(x)] for x in range(4)), axis = 1)
(Influenced by this .) ( 受此影响。)
The df
at this point looks like this:此时的df
如下所示:
Finish by:完成:
result_df = pd.DataFrame(df['classes'].tolist(),index=df.index,columns=['classcol'+str(x) for x in range(4)])
result_df
is the desired output above. result_df
就是上面想要的 output。
Is there a more standard way of doing this?有没有更标准的方法来做到这一点? Something like:就像是:
df.equivalenceClassify(equivalenceClassList)
and I get my output?我得到了我的 output?
We need create the new dict based on your original equivalenceClasses
, then just do replace
我们需要根据您原来的equivalenceClasses
创建新的字典,然后replace
from collections import ChainMap
d = dict(ChainMap(*[dict.fromkeys(y,x) for x , y in equivalenceClasses.items()]))
df = df.replace(d)
Out[299]:
col0 col1 col2 col3
row1 classA classA classB classB
row2 classA classB classC classD
row3 classA classB classD classD
Then然后
df = df.mask(df.apply(pd.Series.duplicated,1))
Out[307]:
col0 col1 col2 col3
row1 classA NaN classB NaN
row2 classA classB classC classD
row3 classA classB classD NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.