简体   繁体   English

如何在 Pandas dataframe 的每一行中获取包含值的预定义等价类名称的列?

[英]How to get columns containing names of pre-defined equivalence classes of values in each row of a Pandas dataframe?

# import package
import pandas as pd

The problem问题

I have a dataframe:我有一个 dataframe:

data = {'row1': ['a', 'A', 'B', 'b'],
        'row2': ['a', 'b', 'c', 'd'],
        'row3': ['a', 'b', 'd', 'D']}
df = pd.DataFrame.from_dict(data, orient='index', columns=['col'+str(x) for x in range(4)])

which looks like:看起来像:

在此处输入图像描述

I also have a list of equivalence classes.我还有一个等价类列表。 Each equivalence class consists of items which are taken as equivalent.每个等价物 class 由被视为等价物的项目组成。

equivalenceClasses={'classA':['a','A'],
                    'classB':['b','B'],
                    'classC':['c','C'],
                    'classD':['d','D']}

I would like to create a dataframe in which the rows in the above dataframe are replaced by the names of the equivalence classes the letters in the row belong to .我想创建一个 dataframe,其中上面 dataframe 中的行被行中字母所属的等价类的名称替换 (Each equivalence class should appear no more than once in a row, and we should use NaN s to post-pad rows in which not all columns are fille by a name of an equivalence class). (每个等价 class 应该在一行中出现不超过一次,我们应该使用NaN来后填充行,其中并非所有列都由等价类的名称填充)。 Ie I want this output:即我想要这个 output:

在此处输入图像描述


My method我的方法

I achieve the goal by:我通过以下方式实现目标:

def differentClasses(colvalues):
    return list(set([equivalenceClassName for colvalue in colvalues
                                          for equivalenceClassName, equivalenceClass in zip(equivalenceClasses.keys(),
                                                                                   equivalenceClasses.values())
                                          if colvalue in equivalenceClass]))

( On list comprehension , on nested list comprehension .) 关于列表理解关于嵌套列表理解。)

df['classes'] = df.apply(lambda row : differentClasses(row['col'+str(x)] for x in range(4)), axis = 1) 

(Influenced by this .) 受此影响。)

The df at this point looks like this:此时的df如下所示:

在此处输入图像描述

Finish by:完成:

result_df = pd.DataFrame(df['classes'].tolist(),index=df.index,columns=['classcol'+str(x) for x in range(4)])

result_df is the desired output above. result_df就是上面想要的 output。


The question问题

Is there a more standard way of doing this?有没有更标准的方法来做到这一点? Something like:就像是:

df.equivalenceClassify(equivalenceClassList)

and I get my output?我得到了我的 output?

We need create the new dict based on your original equivalenceClasses , then just do replace我们需要根据您原来的equivalenceClasses创建新的字典,然后replace

from collections import ChainMap
d = dict(ChainMap(*[dict.fromkeys(y,x) for x , y in equivalenceClasses.items()]))
df = df.replace(d)
Out[299]: 
        col0    col1    col2    col3
row1  classA  classA  classB  classB
row2  classA  classB  classC  classD
row3  classA  classB  classD  classD

Then然后

df = df.mask(df.apply(pd.Series.duplicated,1))
Out[307]: 
        col0    col1    col2    col3
row1  classA     NaN  classB     NaN
row2  classA  classB  classC  classD
row3  classA  classB  classD     NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用预定义的列转换 dataframe Pandas - Convert dataframe Pandas using pre-defined columns pandas get_dummies() 用于具有预定义列表的多列 - pandas get_dummies() for multiple columns with a pre-defined list 如何使用预定义值随机填充 pandas dataframe 中的分类列 - How to randomly populate a categorical column in pandas dataframe using pre-defined values pandas DataFrame 中每一行的前 n 个值的列名 - Columns names of top n values of each row in a pandas DataFrame 将 pandas dataframe 中的每一行转换为 dataframe,其中一列在每一行中包含先前在单独列中的值数组 - Convert each row in pandas dataframe to a dataframe with one column containing in each row an array of values previously in seperate columns Python:获取 pandas dataframe 每一行的最大值列 - Python: Get Columns of max values each row of an pandas dataframe 数据集,其中包含每个观测值的预定义类列表-R中 - Dataset that holds a list of pre-defined classes for each observation - in R 熊猫:将列值替换为空(如果预定义列表中不存在) - Pandas: Replace column values to empty if not present in pre-defined list 使用 str.contains 和 pandas 将不同于预定义范围的值转换为缺失值并根据特定列计算新变量 - Transform values different from a pre-defined range into missing and compute a new varible based on specific columns using str.contains and pandas 如何在整个熊猫而不是每一行中使用整个数据框的groupby获取最大值 - How to get max values with groupby of entire dataframe in Pandas, not each row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM