简体   繁体   English

Pandas/Python:如何根据其他列的值创建新列并将额外条件应用于此新列

[英]Pandas/Python: How to create new column based on values from other columns and apply extra condition to this new column

I have a pandas dataframe and I want to create a new column BB based on the below condition.我有一个 pandas dataframe,我想根据以下条件创建一个新列BB

  1. Create a new column BB , if the values in column TGR1 is 0, assign 0 to BB else,创建一个新列BB ,如果TGR1列中的值为 0,则将 0 分配给BB ,否则,
  2. The value in TGR1 is not 0, look up the columns ( '1','2','3' ) that corresponds with the value in TGR1 assign the value in that column(either '1','2','3' ) to the new column BB . TGR1中的值不为 0,查找与TGR1中的值对应的列 ( '1','2','3' ) 分配该列中的值( '1','2','3 ' ) 到新列BB

I was able to achieve the first step using我能够使用

df.loc[df['TGR1'] == 0, 'BB'] = 0

I also tried to use np.where to come up with but I can figure out the right way to go about this.我也尝试使用np.where来解决这个问题,但我可以找到 go 的正确方法。

df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
    
    

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   2

Expected Output:预计 Output:

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2    BB     
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0     34.00        
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1     5.18     
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2       0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1     19.10     

One way is to use numpy advanced indexing :一种方法是使用numpy 高级索引

import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))

vals
array([[ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ]])

# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]

df
   Dist Track   EVENT_ID        Date     1     2     3  TGR1  TGR2     BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     0  34.00
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     1   5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     0     2   0.00
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     3     1  19.10
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     2   5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     2  34.00

Here you can try to play some numpy trick as in this answer .在这里你可以尝试玩一些 numpy 的把戏,就像在这个答案中一样。

We first define a matrix with values from columns 1,2 and 3 and add a first column with zeros.我们首先定义一个矩阵,其中包含第 1、2 和 3 列的值,并添加带有零的第一列。

import pandas as pd
import numpy as np

# we first define a matrix 
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))

# Then we fill the last 3 columns 
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values

# Then a vector with values from df["TGR1"]
v = df["TGR1"].values


# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)

Timing定时

I compared the timing for some of the answers here.我在这里比较了一些答案的时间。 I just took a df 10_000 larger than the original one我刚拿了一个比原来大 10_000 的df

df_bk = pd.concat([df for i in range(10_000)], ignore_index=True)

and before run each test I do df = df_bk.copy()在运行每个测试之前,我做df = df_bk.copy()

@wwnde's solution @wwnde 的解决方案

CPU times: user 430 ms, sys: 12.1 ms, total: 442 ms
Wall time: 452 ms

@cookesd's solution @cookesd 的解决方案

CPU times: user 746 ms, sys: 0 ns, total: 746 ms
Wall time: 746 ms

@rpanai's solution @rpanai 的解决方案

CPU times: user 5.54 ms, sys: 0 ns, total: 5.54 ms
Wall time: 4.84 ms

@Psidom's solution @Psidom 的解决方案

CPU times: user 5.93 ms, sys: 141 µs, total: 6.07 ms
Wall time: 5.61 ms

Psidom's solution and mine have basically the same timing. Psidom的解决方案和我的时间基本一致。 Here is a plot这是一个 plot 在此处输入图像描述

You can create the column using a list comprehension with your if-else logic您可以使用带有 if-else 逻辑的列表理解来创建列

# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
                   '1':[random.randint(0,100) for i in range(10)],
                   '2':[random.randint(101,200) for i in range(10)],
                   '3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
            for ind,tgr1_val in enumerate(df['TGR1'].values)]

df

#    TGR1   1    2    3   BB
# 0     0  54  107  217    0
# 1     2  71  128  277  128
# 2     1  25  103  269   25
# 3     0  80  112  279    0
# 4     2  98  167  228  167
# 5     3  26  192  285  285
# 6     0  27  107  228    0
# 7     2  13  103  298  103
# 8     3  28  196  289  289
# 9     2  72  186  251  186

this is done easy with the use of boolean mask as you did it in your step one:使用 boolean 掩码可以轻松完成此操作,就像您在第一步中所做的那样:

df['BB'][df['TGR1'] == 0] = 0

for the other values greaters than 0:对于其他大于 0 的值:

df['BB'][df['TGR1'] == 1] = df['1'][df['TGR1'] == 1]
df['BB'][df['TGR1'] == 2] = df['2'][df['TGR1'] == 2]
df['BB'][df['TGR1'] == 3] = df['3'][df['TGR1'] == 3]

output:
    1         2       3   TGR1   BB
0   34.0    5.18    19.1    1   34.00
1   34.0    5.18    19.1    2   5.18
2   34.0    5.18    19.1    0   0.00
3   34.0    5.18    19.1    3   19.10
4   34.0    5.18    19.1    2   5.18

probably it is pretty much readable.可能它是非常可读的。

Drop TGR2temporarily, do alook up of columns using TGR1 and that should do.暂时删除 TGR2,使用 TGR1 查找列,应该这样做。 code below下面的代码

s = df.astype(str).drop('TGR2',1).filter(regex='\d', axis=1).reset_index()#Drop TRG2 and filter rows with digits to allow lookup
i = s.astype(str).columns.get_indexer(s.TGR1)#DO alook up to get columns whose values are in TGR1
df['BB'] = s.values[s.index,i]
   Dist Track   EVENT_ID        Date     1     2     3 TGR1 TGR2    BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    0  34.0
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    1  5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    0    2     0
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    3    1  19.1
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    2  5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    2  34.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他列的值创建新列/在 Pandas 中按行应用多列的 function - Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas 基于python pandas中其他列的值创建新列 - Creating a new column based on values from other columns in python pandas Python Pandas 基于其他列值的新列 - Python Pandas New Column based on values from other columns 如何根据pandas中其他列的值计算新列 - python - how to compute a new column based on the values of other columns in pandas - python 根据多列中的值和相同条件在熊猫中创建新列 - Create a new column in pandas based on values in multiple columns and the same condition 根据其他列(python)中的分类值创建新的pandas列 - Create new pandas column based on categorical values in other column (python) 如何根据其他列的 boolean 值在 pandas 中创建新列? - How to create a new column in pandas based off boolean values from other columns? 如何根据 Pandas DataFrame 中其他列的值创建新列 - How to create a new column based on values from other columns in a Pandas DataFrame 根据其他 pandas 列中列表中的值数创建新列? - Create new columns based on number of values in list in other pandas column? Pandas:根据其他列的文本值创建新列 - Pandas : Create new column based on text values of other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM