简体   繁体   English

新的 DataFrame boolean 列检查某些列是否等于 1

[英]New DataFrame boolean column that checks whether or not any of certain columns equal 1

I have the following pd.DataFrame and list of columns:我有以下pd.DataFrame和列列表:

col_list = ["med_a", "med_c"]
df = pd.DataFrame.from_dict({'med_a': [0, 0, 1, 0], 'med_b': [0, 0, 1, 1], 'med_c': [0, 1, 1, 0]})

print(df)
>>>
    med_a   med_b   med_c
0   0       0       0
1   0       0       1
2   1       1       1
3   0       1       0

I want to make a new column ( new_col ) that holds either True/False (or 0/1) if any of the columns in col_list is equal to 1, for each row.如果col_list中的任何列等于 1,我想为每一行创建一个新列( new_col ),该列包含 True/False(或 0/1)。 So the result should become:所以结果应该变成:

     med_a  med_b   med_c   new_col
0   0       0       0       0
1   0       0       1       1
2   1       1       1       1
3   0       1       0       0

I know how to select only those rows where at least one of the columns in is equal to 1, but that doesn't check only those columns in col_list , and it doesn't create a new column:我知道如何 select 仅在其中至少一列等于 1 的那些行中,但不只检查col_list中的那些列,并且它不会创建新列:

df[(df== 1).any(axis=1)]

print(df)
>>>
    med_a   med_b   med_c
1   0       0       1
2   1       1       1
3   0       1       1

How would I achieve the desired result?我将如何达到预期的结果? Any help is appreciated.任何帮助表示赞赏。

You're so close!你这么近! Just filter the df with the col_list before any on axis=1 + astype(int) .只需在 axis=1 + astype(int)上的any之前使用col_list过滤 df 。

import numpy as np
import pandas as pd

col_list = ["med_a", "med_c"]
df = pd.DataFrame.from_dict({'med_a': [0, 0, 1, 0],
                             'med_b': [0, 0, 1, 1],
                             'med_c': [0, 1, 1, 0]})


df['new_col'] = df[col_list].any(axis=1).astype(int)

print(df)

Or via np.where :或通过np.where

df['new_col'] = np.where(df[col_list].any(axis=1), 1, 0)

   med_a  med_b  med_c  new_col
0      0      0      0        0
1      0      0      1        1
2      1      1      1        1
3      0      1      0        0

Timing information via perfplot:通过 perfplot 的时序信息:

np.where 与 asint 的性能图

np.where is faster than astype(int) up to 100,000 rows at which point they are about the same. np.whereastype(int)快多达 100,000 行,此时它们大致相同。

import numpy as np
import pandas as pd
import perfplot

np.random.seed(5)
col_list = ["med_a", "med_c"]


def gen_data(n):
    return pd.DataFrame.from_dict({'med_a': np.random.choice([0, 1], size=n),
                                   'med_b': np.random.choice([0, 1], size=n),
                                   'med_c': np.random.choice([0, 1], size=n)})


def np_where(df):
    df['new_col'] = np.where(df[col_list].any(axis=1), 1, 0)
    return df


def astype_int(df):
    df['new_col'] = df[col_list].any(axis=1).astype(int)
    return df


if __name__ == '__main__':
    out = perfplot.bench(
        setup=gen_data,
        kernels=[
            np_where,
            astype_int
        ],
        labels=[
            'np_where',
            'astype_int'
        ],
        n_range=[2 ** k for k in range(25)],
        equality_check=None
    )
    out.save('perfplot_results.png', transparent=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建一个 Boolean 列来检查 Dataframe 中的 uid elsehwere 是否符合特定条件? - Create a Boolean Column that checks for uid elsehwere in a Dataframe matching a certain condition? 检查列表中的任何值是否存在于一组数据框列中并创建新的布尔列 - Check if any value in a list exists in a group of dataframe columns and create new boolean column 根据之前列中满足的某些条件,在 dataframe 中创建新列 - Create new column in dataframe based on certain conditions met in previous columns 在多个布尔列中拆分pandas dataframe列 - splitting pandas dataframe column in multiple boolean columns Pandas DataFrame中的新列,基于列表中是否有任何值出现在数据集中 - New column in Pandas DataFrame based on whether any value from a list appears in the dataset 对 DataFrame 中的布尔列执行“如果有的话”检查? - Performing an "if any true" check on boolean column in DataFrame? 如何根据数据框中各个列的不同布尔条件创建新列 - How to create a new column based upon different boolean criteria for individual columns in dataframe 基于布尔条件的 Pandas 数据框中的新列 - New column in Pandas dataframe based on boolean conditions Pandas 相邻列与新列的比较(布尔值) - Pandas Comparison (boolean) of Adjacent columns to New Column 熊猫数据框新列,用于检查前一天 - pandas dataframe new column which checks previous day
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM