[英]How to create a new column based on multiple conditions in another column
In pandas, How can I create a new column B
based on a column A
in df
, such that:在 pandas 中,如何根据df
中的A
列创建新的B
列,这样:
B(i)=1
if A_(i-1)-A_(i) >= 0
when A_(i) <= 10
B(i)=1
如果A_(i-1)-A_(i) >= 0
当A_(i) <= 10
B(i)=1
if A_(i-1)-A_(i) >= 2
when 10 < A_(i) <= 20
B(i)=1
如果A_(i-1)-A_(i) >= 2
当10 < A_(i) <= 20
B(i)=1
if A_(i-1)-A_(i) >= 5
when 20 < A_(i)
B(i)=1
如果A_(i-1)-A_(i) >= 5
当20 < A_(i)
B(i)=0
for any other case对于任何其他情况, B(i)=0
However, the first B_i
value is always two但是,第一个B_i
值总是两个
Example:例子:
A一种 | B乙 |
---|---|
5 5个 | 2 (the first B_i) 2(第一个B_i) |
12 12 | 0 0 |
14 14 | 0 0 |
22 22 | 0 0 |
20 20 | 1 1个 |
33 33 | 0 0 |
11 11 | 1 1个 |
8 8个 | 1 1个 |
15 15 | 0 0 |
11 11 | 1 1个 |
You can use Pandas.shift
for creating A_(i-1)
and use Numpy.select
for checking multiple conditions like below:您可以使用Pandas.shift
创建A_(i-1)
并使用Numpy.select
检查多个条件,如下所示:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[5,12,14,22,20,33,11,8,15,11]})
df['A_prv'] = df['A'].shift(1)
conditions = [
(df.index==0),
((df['A_prv'] - df['A'] >= 0) & (df['A'].le(10))),
((df['A_prv'] - df['A'] >= 2) & (df['A'].between(10, 20, inclusive='right'))),
# ^^^ 10 < df['A'] <= 20 ^^^
((df['A_prv'] - df['A'] >= 5) & (df['A'].ge(20)))
]
choices = [2, 1, 1, 1]
df['B'] = np.select(conditions, choices, default=0)
print(df)
Output: Output:
A A_prv B
0 5 NaN 2
1 12 5.0 0
2 14 12.0 0
3 22 14.0 0
4 20 22.0 1
5 33 20.0 0
6 11 33.0 1
7 8 11.0 1
8 15 8.0 0
9 11 15.0 1
The most intuitive way is to iterate trough the lines testing all the three conditions in a single-line if-else
(as B(i) is 1 for all the true conditions).最直观的方法是在单行if-else
中遍历测试所有三个条件的行(因为对于所有真实条件,B(i) 为 1)。
import pandas as pd
df = pd.DataFrame({'A':[5,12,14,22,20,33,11,8,15,11]})
B = [2]
for i in range(1,len(df['A'])):
newvalue = 1 if (df['A'][i-1]-df['A'][i]>=0 and df['A'][i]<=10) or (df['A'][i-1]-df['A'][i]>=2 and df['A'][i]>10 and df['A'][i]<=20) or (df['A'][i-1]-df['A'][i]>=5 and df['A'][i]>20) else 0
B.append(newvalue)
df['B'] = B
print(df)
Output: Output:
A B
0 5 2
1 12 0
2 14 0
3 22 0
4 20 1
5 33 0
6 11 1
7 8 1
8 15 0
9 11 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.