简体   繁体   English

根据其他列值创建一个新列 - 条件前向填充?

[英]Create a new column based on other column values - conditional forward fill?

Have the following dataframe有以下dataframe

d = {'c_1': [0,0,0,1,0,0,0,1,0,0,0,0],
     'c_2': [0,0,0,0,0,1,0,0,0,0,0,1]}

df = pd.DataFrame(d)

I want to create, another column 'f' that returns 1 when c_1 == 1 until c_2 == 1 in which case the value in 'f' will be 0我想创建另一列“f”,当c_1 == 1直到c_2 == 1时返回 1,在这种情况下,“f”中的值将为 0

desired output as follows所需的 output 如下

    c_1 c_2 f
0   0   0   0
1   0   0   0
2   0   0   0
3   1   0   1
4   0   0   1
5   0   1   0
6   0   0   0
7   1   0   1
8   0   0   1
9   0   0   1
10  0   0   1
11  0   1   0

Thinking this requires some kind of conditional forward fill, looking at previous questions however havn't been able to arrive at desired output认为这需要某种有条件的前向填充,查看以前的问题但未能达到所需的 output

edit: have come across a related scenario where inputs differ and current solutions do not work.编辑:遇到了输入不同且当前解决方案不起作用的相关场景。 Will confirm answered but appreciate any input on the below将确认已回答,但感谢以下任何输入

d = {'c_1': [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0],
     'c_2': [1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]}

df = pd.DataFrame(d)

desired output as follows - same as before I want to create, another column 'f' that returns 1 when c_1 == 1 until c_2 == 1 in which case the value in 'f' will be 0所需的 output 如下 - 与我要创建之前相同,另一列“f”在c_1 == 1直到c_2 == 1时返回 1,在这种情况下,“f”中的值将为 0

   c_1  c_2 f
0   0   1   0
1   0   1   0
2   0   1   0
3   0   0   0
4   0   0   0
5   0   0   0
6   1   0   1
7   0   0   1
8   0   1   0
9   0   1   0
10  0   1   0
11  0   1   0
12  0   1   0
13  0   0   0
14  0   0   0
15  0   0   0
16  1   0   1
17  0   0   1
18  1   0   1
19  1   0   1
20  0   0   1
21  0   0   1
22  0   0   1
23  0   0   1
24  0   1   0

You can try:你可以试试:

df['f'] = df[['c_1','c_2']].sum(1).cumsum().mod(2)

print(df)

    c_1  c_2  f
0     0    0  0
1     0    0  0
2     0    0  0
3     1    0  1
4     0    0  1
5     0    1  0
6     0    0  0
7     1    0  1
8     0    0  1
9     0    0  1
10    0    0  1
11    0    1  0

You can also try like this:你也可以这样尝试:

df.loc[df['c_2'].shift().ne(1), 'f'] = df['c_1'].replace(to_replace=0, method='ffill')


    c_1 c_2 f
0   0   0   0.0
1   0   0   0.0
2   0   0   0.0
3   1   0   1.0
4   0   0   1.0
5   0   1   1.0 # <--- set these value to be zero
6   0   0   NaN
7   1   0   1.0
8   0   0   1.0
9   0   0   1.0
10  0   0   1.0
11  0   1   1.0 # <---

add one more line.再添加一行。 if you don't want to include the end position.如果您不想包含结尾 position。


Final:最后:

df.loc[df['c_2'].shift().ne(1) & df['c_2'].ne(1), 'f'] = df['c_1'].replace(to_replace=0, method='ffill')
df = df.fillna(0)

    c_1 c_2 f
0   0   0   0.0
1   0   0   0.0
2   0   0   0.0
3   1   0   1.0
4   0   0   1.0
5   0   1   0.0
6   0   0   0.0
7   1   0   1.0
8   0   0   1.0
9   0   0   1.0
10  0   0   1.0
11  0   1   0.0

This should work for both scenarios:这应该适用于两种情况:

df['c_1'].groupby(df[['c_1','c_2']].sum(1).cumsum()).transform('first')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM