Pandas/Python：如何根据其他列的值创建新列并将额外条件应用于此新列

Question

I have a pandas dataframe and I want to create a new column BB based on the below condition.我有一个 pandas dataframe，我想根据以下条件创建一个新列BB 。

Create a new column BB , if the values in column TGR1 is 0, assign 0 to BB else,创建一个新列BB ，如果TGR1列中的值为 0，则将 0 分配给BB ，否则，
The value in TGR1 is not 0, look up the columns ( '1','2','3' ) that corresponds with the value in TGR1 assign the value in that column(either '1','2','3' ) to the new column BB . TGR1中的值不为 0，查找与TGR1中的值对应的列 ( '1','2','3' ) 分配该列中的值（ '1','2','3 ' ) 到新列BB 。

I was able to achieve the first step using我能够使用

df.loc[df['TGR1'] == 0, 'BB'] = 0

I also tried to use np.where to come up with but I can figure out the right way to go about this.我也尝试使用np.where来解决这个问题，但我可以找到 go 的正确方法。

df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
    
    

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   2

Expected Output:预计 Output：

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2    BB     
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0     34.00        
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1     5.18     
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2       0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1     19.10

Answer 1

One way is to use numpy advanced indexing :一种方法是使用numpy 高级索引：

import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))

vals
array([[ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ]])

# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]

df
   Dist Track   EVENT_ID        Date     1     2     3  TGR1  TGR2     BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     0  34.00
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     1   5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     0     2   0.00
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     3     1  19.10
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     2   5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     2  34.00

Answer 2

Here you can try to play some numpy trick as in this answer .在这里你可以尝试玩一些 numpy 的把戏，就像在这个答案中一样。

We first define a matrix with values from columns 1,2 and 3 and add a first column with zeros.我们首先定义一个矩阵，其中包含第 1、2 和 3 列的值，并添加带有零的第一列。

import pandas as pd
import numpy as np

# we first define a matrix 
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))

# Then we fill the last 3 columns 
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values

# Then a vector with values from df["TGR1"]
v = df["TGR1"].values


# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)

Timing定时

I compared the timing for some of the answers here.我在这里比较了一些答案的时间。 I just took a df 10_000 larger than the original one我刚拿了一个比原来大 10_000 的df

df_bk = pd.concat([df for i in range(10_000)], ignore_index=True)

and before run each test I do df = df_bk.copy()在运行每个测试之前，我做df = df_bk.copy()

@wwnde's solution @wwnde 的解决方案

CPU times: user 430 ms, sys: 12.1 ms, total: 442 ms
Wall time: 452 ms

@cookesd's solution @cookesd 的解决方案

CPU times: user 746 ms, sys: 0 ns, total: 746 ms
Wall time: 746 ms

@rpanai's solution @rpanai 的解决方案

CPU times: user 5.54 ms, sys: 0 ns, total: 5.54 ms
Wall time: 4.84 ms

@Psidom's solution @Psidom 的解决方案

CPU times: user 5.93 ms, sys: 141 µs, total: 6.07 ms
Wall time: 5.61 ms

Psidom's solution and mine have basically the same timing. Psidom的解决方案和我的时间基本一致。 Here is a plot这是一个 plot

Answer 3

You can create the column using a list comprehension with your if-else logic您可以使用带有 if-else 逻辑的列表理解来创建列

# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
                   '1':[random.randint(0,100) for i in range(10)],
                   '2':[random.randint(101,200) for i in range(10)],
                   '3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
            for ind,tgr1_val in enumerate(df['TGR1'].values)]

df

#    TGR1   1    2    3   BB
# 0     0  54  107  217    0
# 1     2  71  128  277  128
# 2     1  25  103  269   25
# 3     0  80  112  279    0
# 4     2  98  167  228  167
# 5     3  26  192  285  285
# 6     0  27  107  228    0
# 7     2  13  103  298  103
# 8     3  28  196  289  289
# 9     2  72  186  251  186

Answer 4

this is done easy with the use of boolean mask as you did it in your step one:使用 boolean 掩码可以轻松完成此操作，就像您在第一步中所做的那样：

df['BB'][df['TGR1'] == 0] = 0

for the other values greaters than 0:对于其他大于 0 的值：

df['BB'][df['TGR1'] == 1] = df['1'][df['TGR1'] == 1]
df['BB'][df['TGR1'] == 2] = df['2'][df['TGR1'] == 2]
df['BB'][df['TGR1'] == 3] = df['3'][df['TGR1'] == 3]

output:
    1         2       3   TGR1   BB
0   34.0    5.18    19.1    1   34.00
1   34.0    5.18    19.1    2   5.18
2   34.0    5.18    19.1    0   0.00
3   34.0    5.18    19.1    3   19.10
4   34.0    5.18    19.1    2   5.18

probably it is pretty much readable.可能它是非常可读的。

Answer 5

Drop TGR2temporarily, do alook up of columns using TGR1 and that should do.暂时删除 TGR2，使用 TGR1 查找列，应该这样做。 code below下面的代码

s = df.astype(str).drop('TGR2',1).filter(regex='\d', axis=1).reset_index()#Drop TRG2 and filter rows with digits to allow lookup
i = s.astype(str).columns.get_indexer(s.TGR1)#DO alook up to get columns whose values are in TGR1
df['BB'] = s.values[s.index,i]

   Dist Track   EVENT_ID        Date     1     2     3 TGR1 TGR2    BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    0  34.0
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    1  5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    0    2     0
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    3    1  19.1
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    2  5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    2  34.0

Pandas/Python：如何根据其他列的值创建新列并将额外条件应用于此新列

问题描述

5 个解决方案

解决方案1
4 已采纳 2021-09-26 00:49:13

解决方案2
4 2021-09-26 00:55:19

Timing定时

@wwnde's solution @wwnde 的解决方案

@cookesd's solution @cookesd 的解决方案

@rpanai's solution @rpanai 的解决方案

@Psidom's solution @Psidom 的解决方案

解决方案3
1 2021-09-26 01:03:02

解决方案4
1 2021-09-26 01:16:42

解决方案5
0 2021-09-26 00:34:24

Pandas/Python：如何根据其他列的值创建新列并将额外条件应用于此新列

问题描述

5 个解决方案

解决方案1 4 已采纳 2021-09-26 00:49:13

解决方案2 4 2021-09-26 00:55:19

Timing定时

@wwnde's solution @wwnde 的解决方案

@cookesd's solution @cookesd 的解决方案

@rpanai's solution @rpanai 的解决方案

@Psidom's solution @Psidom 的解决方案

解决方案3 1 2021-09-26 01:03:02

解决方案4 1 2021-09-26 01:16:42

解决方案5 0 2021-09-26 00:34:24

解决方案1
4 已采纳 2021-09-26 00:49:13

解决方案2
4 2021-09-26 00:55:19

解决方案3
1 2021-09-26 01:03:02

解决方案4
1 2021-09-26 01:16:42

解决方案5
0 2021-09-26 00:34:24