创建 lambda function 应用于 select df 列

Question

I have the following df:我有以下df：

id   header1     header2      diabetes obesity hypertension/high blood pressure. . .      
 1  metabolism   diabetes          no      no          no
 2  heart issue  heart disease    None     None        None       
 3    obesity    diabetes          yes     no          no
 4   metabolism  hypertension      no      no          yes
 5   heart issue heart disease     no      no          yes
 6    obesity    diabetes          yes     yes         no
 7    obesity    diabetes          no      no          yes

I want to create a lambda function that iterates through header1 and header2, checks if either cell is a substring of the column names.我想创建一个遍历 header1 和 header2 的 lambda function，检查任一单元格是否是列名的 substring。 Depending on whether the column has yes, no, or null, return a column with a flag value.根据列是yes、no 还是null，返回带有标志值的列。

For every cell in header1 or header2, if it contains a substring match in the column name and there is a yes within that column, flag the new column as 2. If any of the category columns contains a yes, but not a keyword match with header1 and header2, put a 1. Else, leave blank!对于 header1 或 header2 中的每个单元格，如果它在列名中包含 substring 匹配并且该列中有一个是，则将新列标记为 2。如果任何类别列包含一个是，但不是关键字匹配header1 和 header2，输入 1。否则，留空！

Example)例子）

attempt: cols = [x for x in df.columns if x not in ['header1', 'header2']]尝试：cols = [x for x in df.columns if x not in ['header1', 'header2']]

df['flag'] = df.apply(lambda x: 2 if df['header1'] or df['header2'] in cols and cols == yes, 1 elif df['header1'] not in df['header2'] in cols and cols == yes, None else

desired result:期望的结果：

id   header1     header2    diabetes  obesity hypertension/high blood pressure | flag      
 1  metabolism   diabetes         no      no            no                       None                  
 2  heart issue  heart disease  None      None         None                      None
 3    obesity    diabetes         yes     no            no                        2
 4   metabolism  hypertension     no      no            yes                       2
 5   heart issue heart disease    no      no            yes                       1
 6    obesity    diabetes         yes     yes           no                        2
 7    obesity    diabetes          no      no          yes                        1

Constructor构造函数

Please note that my actual df has a dynamic amount of yes/no columns, but only two header columns.请注意，我的实际 df 具有动态数量的 yes/no 列，但只有两个 header 列。

data = np.array([('metabolism','diabetes','no','no', 'no'), 
                 ('heart issue', 'heart disease', None,None',None),
                 ('obesity','diabetes','yes','no','no'),
                 ('metabolism','hypertension','no','no','yes'),
                 ('heart issue', 'heart disease','no','no','yes'),
                 ('obesity', 'diabetes','yes','yes', 'no'),
                 ('obesity', 'diabetes', 'no','no', 'yes')])


df = pd.DataFrame(data, columns=['header1', 'header2','diabetes','obesity','hypertension/high blood pressure'])

cols = [x for x in df.columns if x not in ['header1', 'header2']]

Answer 1

First create disease column index and disease names series (the latter is used to capture "hypertension").首先创建疾病列索引和疾病名称系列（后者用于捕捉“高血压”）。

Then simply apply a function that first counts the "yes" answers and searches for disease names among the "yes" answers然后只需应用一个 function，它首先计算“是”答案并在“是”答案中搜索疾病名称

headers = ['header1', 'header2']
disease_cols = df.columns.difference(headers)
disease_names = disease_cols.str.split('/').str[0]

def get_flag(row):
    yes = row[disease_cols].eq('yes')
    if sum(yes) > 0:
        return 2 if row[headers].str.contains('|'.join(disease_names[yes])).any() else 1
    else:
        return np.nan


df['flag'] = df.apply(get_flag, axis=1)

Output: Output：

       header1        header2 diabetes obesity hypertension/high blood pressure   flag
0   metabolism       diabetes       no      no                       no           NaN
1  heart issue  heart disease       no      no                       no           NaN
2      obesity       diabetes      yes      no                       no           2.0
3   metabolism   hypertension       no      no                      yes           2.0
4  heart issue  heart disease       no      no                      yes           1.0
5      obesity       diabetes      yes     yes                       no           2.0
6      obesity       diabetes       no      no                      yes           1.0

创建 lambda function 应用于 select df 列

问题描述

1 个解决方案

解决方案1
0 2022-01-26 21:46:54

创建 lambda function 应用于 select df 列

问题描述

1 个解决方案

解决方案1 0 2022-01-26 21:46:54

解决方案1
0 2022-01-26 21:46:54