[英]Create a new column based on other column values - conditional forward fill?
Have the following dataframe有以下dataframe
d = {'c_1': [0,0,0,1,0,0,0,1,0,0,0,0],
'c_2': [0,0,0,0,0,1,0,0,0,0,0,1]}
df = pd.DataFrame(d)
I want to create, another column 'f' that returns 1 when c_1 == 1
until c_2 == 1
in which case the value in 'f' will be 0我想创建另一列“f”,当c_1 == 1
直到c_2 == 1
时返回 1,在这种情况下,“f”中的值将为 0
desired output as follows所需的 output 如下
c_1 c_2 f
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 1
4 0 0 1
5 0 1 0
6 0 0 0
7 1 0 1
8 0 0 1
9 0 0 1
10 0 0 1
11 0 1 0
Thinking this requires some kind of conditional forward fill, looking at previous questions however havn't been able to arrive at desired output认为这需要某种有条件的前向填充,查看以前的问题但未能达到所需的 output
edit: have come across a related scenario where inputs differ and current solutions do not work.编辑:遇到了输入不同且当前解决方案不起作用的相关场景。 Will confirm answered but appreciate any input on the below将确认已回答,但感谢以下任何输入
d = {'c_1': [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0],
'c_2': [1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]}
df = pd.DataFrame(d)
desired output as follows - same as before I want to create, another column 'f' that returns 1 when c_1 == 1
until c_2 == 1
in which case the value in 'f' will be 0所需的 output 如下 - 与我要创建之前相同,另一列“f”在c_1 == 1
直到c_2 == 1
时返回 1,在这种情况下,“f”中的值将为 0
c_1 c_2 f
0 0 1 0
1 0 1 0
2 0 1 0
3 0 0 0
4 0 0 0
5 0 0 0
6 1 0 1
7 0 0 1
8 0 1 0
9 0 1 0
10 0 1 0
11 0 1 0
12 0 1 0
13 0 0 0
14 0 0 0
15 0 0 0
16 1 0 1
17 0 0 1
18 1 0 1
19 1 0 1
20 0 0 1
21 0 0 1
22 0 0 1
23 0 0 1
24 0 1 0
You can try:你可以试试:
df['f'] = df[['c_1','c_2']].sum(1).cumsum().mod(2)
print(df)
c_1 c_2 f
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 1
4 0 0 1
5 0 1 0
6 0 0 0
7 1 0 1
8 0 0 1
9 0 0 1
10 0 0 1
11 0 1 0
You can also try like this:你也可以这样尝试:
df.loc[df['c_2'].shift().ne(1), 'f'] = df['c_1'].replace(to_replace=0, method='ffill')
c_1 c_2 f
0 0 0 0.0
1 0 0 0.0
2 0 0 0.0
3 1 0 1.0
4 0 0 1.0
5 0 1 1.0 # <--- set these value to be zero
6 0 0 NaN
7 1 0 1.0
8 0 0 1.0
9 0 0 1.0
10 0 0 1.0
11 0 1 1.0 # <---
add one more line.再添加一行。 if you don't want to include the end position.如果您不想包含结尾 position。
Final:最后:
df.loc[df['c_2'].shift().ne(1) & df['c_2'].ne(1), 'f'] = df['c_1'].replace(to_replace=0, method='ffill')
df = df.fillna(0)
c_1 c_2 f
0 0 0 0.0
1 0 0 0.0
2 0 0 0.0
3 1 0 1.0
4 0 0 1.0
5 0 1 0.0
6 0 0 0.0
7 1 0 1.0
8 0 0 1.0
9 0 0 1.0
10 0 0 1.0
11 0 1 0.0
This should work for both scenarios:这应该适用于两种情况:
df['c_1'].groupby(df[['c_1','c_2']].sum(1).cumsum()).transform('first')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.