简体   繁体   English

根据两列的最大值在数据框中创建一个带有列名的新列

[英]Creating a new column with column names in dataframe based on maximum value of two columns

I have a dataframe as follows:我有一个数据框如下:

Col1    Val1    Val2
A      1        0
B      2        3
C      0        4
D      3        2

I need the following output:我需要以下输出:

Col1    Val1    Val2    Type
A       1       0       Val1
B       2       3       Val2
C       0       4       Val2
D       3       2       Val1

The column Type basically refers to where the maximum of Val1 and Val2 are.Type基本上是指Val1Val2的最大值在哪里。

I am not sure how to approach this.我不知道如何解决这个问题。

you can do it with :你可以这样做:


df['Type'] = df.apply(lambda x: 'Val1' if x.Val1 > x.Val2 else 'Val2', axis=1)

Special case : if you want to return None when Val1 == Val2特殊情况:如果你想在 Val1 == Val2 时返回 None


def get_max_col(x):
    if x.Val1 > x.Val2:
        return 'Val1'
    elif x.Val1 == x.Val2:
        return None
    else:
        return 'Val2'


df['Type'] = df.apply(get_max_col, axis=1)

(df['Val1'] >= df['Val2']).map({True: 'Val1', False: 'Val2'}

In [43]: df = pd.DataFrame(np.random.randint(0, 20, (10_000, 2)), columns=['val1', 'val2'])
    ...: %timeit (df['val1'] >= df['val2']).map({True: 'val1', False: 'val2'})
    ...: %timeit df.apply(lambda x: 'val1' if x.val1 >= x.val2 else 'val2', axis=1)
    ...: %timeit df.loc[:, ['val1', 'val2']].idxmax(axis=1)
1.27 ms ± 45.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
123 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
5.73 ms ± 95.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run:跑:

df['Type'] = df.iloc[:, 1:].idxmax(axis=1)

This code works regardless of the number of columns and their names.无论列数及其名称如何,此代码都有效。

iloc[:, 1:] is to "filter out" column 0. iloc[:, 1:]是“过滤掉”第 0 列。

If you want just these 2 columns only , alternative choices are:如果你只是想只有这2列,可替代的选择是:

df['Type'] = df.iloc[:, 1:3].idxmax(axis=1)

or或者

df['Type'] = df[['Val1', 'Val2']].idxmax(axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 dataframe 中 2 列的匹配值创建新列 - Creating a new column based on matching value of 2 columns in a dataframe 基于 pandas dataframe 中的两列创建列 - Creating a column based on two columns in a pandas dataframe 将 Pandas dataframe 分组为两列, output 将最大列值指示到新列 - Group Pandas dataframe by two columns and output the maximum column value indication to new column 根据其他列名创建自定义列名 pandas dataframe - Creating custom names for columns based on other column names in pandas dataframe 从现有数据框列名称创建新的数据框列 - Creating new dataframe columns from existing dataframe column names 基于两列创建新变量作为索引一列作为新变量名称python pandas或R. - Creating new variables based on two columns as index one column as new variable names python pandas or R 根据其他列的值创建新列 - Creating new columns based on value of other column 根据行值使用其他列的名称填充新的 Pandas 数据框列 - Populate a new pandas dataframe column with names of other columns based on their row value 根据其他数据框中的列值在 Pandas 数据框中创建列 - Creating columns in a pandas dataframe based on a column value in other dataframe Python - 遍历数据框并根据其他两列和字符串创建新列 - Python - Iterating through Dataframe and creating new column based on two other columns and string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM