如何在 Pandas dataframe 的每一行中获取包含值的预定义等价类名称的列？

Question

# import package
import pandas as pd

The problem问题

I have a dataframe:我有一个 dataframe：

data = {'row1': ['a', 'A', 'B', 'b'],
        'row2': ['a', 'b', 'c', 'd'],
        'row3': ['a', 'b', 'd', 'D']}
df = pd.DataFrame.from_dict(data, orient='index', columns=['col'+str(x) for x in range(4)])

which looks like:看起来像：

I also have a list of equivalence classes.我还有一个等价类列表。 Each equivalence class consists of items which are taken as equivalent.每个等价物 class 由被视为等价物的项目组成。

equivalenceClasses={'classA':['a','A'],
                    'classB':['b','B'],
                    'classC':['c','C'],
                    'classD':['d','D']}

I would like to create a dataframe in which the rows in the above dataframe are replaced by the names of the equivalence classes the letters in the row belong to .我想创建一个 dataframe，其中上面 dataframe 中的行被行中字母所属的等价类的名称替换。 (Each equivalence class should appear no more than once in a row, and we should use NaN s to post-pad rows in which not all columns are fille by a name of an equivalence class). （每个等价 class 应该在一行中出现不超过一次，我们应该使用NaN来后填充行，其中并非所有列都由等价类的名称填充）。 Ie I want this output:即我想要这个 output：

My method我的方法

I achieve the goal by:我通过以下方式实现目标：

def differentClasses(colvalues):
    return list(set([equivalenceClassName for colvalue in colvalues
                                          for equivalenceClassName, equivalenceClass in zip(equivalenceClasses.keys(),
                                                                                   equivalenceClasses.values())
                                          if colvalue in equivalenceClass]))

( On list comprehension , on nested list comprehension .) （关于列表理解，关于嵌套列表理解。）

df['classes'] = df.apply(lambda row : differentClasses(row['col'+str(x)] for x in range(4)), axis = 1)

(Influenced by this .) （受此影响。）

The df at this point looks like this:此时的df如下所示：

Finish by:完成：

result_df = pd.DataFrame(df['classes'].tolist(),index=df.index,columns=['classcol'+str(x) for x in range(4)])

result_df is the desired output above. result_df就是上面想要的 output。

The question问题

Is there a more standard way of doing this?有没有更标准的方法来做到这一点？ Something like:就像是：

df.equivalenceClassify(equivalenceClassList)

and I get my output?我得到了我的 output？

Answer 1

We need create the new dict based on your original equivalenceClasses , then just do replace我们需要根据您原来的equivalenceClasses创建新的字典，然后replace

from collections import ChainMap
d = dict(ChainMap(*[dict.fromkeys(y,x) for x , y in equivalenceClasses.items()]))
df = df.replace(d)
Out[299]: 
        col0    col1    col2    col3
row1  classA  classA  classB  classB
row2  classA  classB  classC  classD
row3  classA  classB  classD  classD

Then然后

df = df.mask(df.apply(pd.Series.duplicated,1))
Out[307]: 
        col0    col1    col2    col3
row1  classA     NaN  classB     NaN
row2  classA  classB  classC  classD
row3  classA  classB  classD     NaN

如何在 Pandas dataframe 的每一行中获取包含值的预定义等价类名称的列？

问题描述

The problem问题

My method我的方法

The question问题

1 个解决方案

解决方案1
2 已采纳 2020-08-03 21:18:42

如何在 Pandas dataframe 的每一行中获取包含值的预定义等价类名称的列？

问题描述

The problem问题

My method我的方法

The question问题

1 个解决方案

解决方案1 2 已采纳 2020-08-03 21:18:42

解决方案1
2 已采纳 2020-08-03 21:18:42