[英]Create new columns based on other columns and update the value of row of new column based on the previous row while creating the new column
I am learning pandas in the past couple of months.在过去的几个月里,我正在学习熊猫。 I have a data frame like this:我有一个这样的数据框:
index Random id diff pct
0 2018-01-01 31 1 3 1
1 2018-01-02 11 1 2 2
2 2018-01-03 21 1 4 0
3 2018-01-04 23 2 1 0
4 2018-01-05 43 2 6 3
5 2018-01-06 42 2 1 1
6 2018-01-07 51 3 2 5
7 2018-01-08 47 3 2 0
8 2018-01-09 49 3 3 2
9 2018-01-10 22 3 1 3
What I want is to create a column recommend by 'Yes' and 'NO' by conditioning on other columns which I can do, but I also need to update the value of the 'Random' column on each row(or create a new column) with updating info for random column if the 'recommend' Column is Yes.我想要的是通过以我可以做的其他列为条件来创建一个由“是”和“否”推荐的列,但我还需要更新每一行上“随机”列的值(或创建一个新列) 如果“推荐”列为“是”,则更新随机列的信息。 For instance, The condition is if pct<diff, then 'recommand' column will be 'Yes' and 'Random'/'New_random' will be Random+diff, otherwise the 'recommand' column will be 'No' and 'Random'/'New_random' will be Random value of the previous row.例如,条件是如果 pct<diff,那么 'recommand' 列将是 'Yes' 并且 'Random'/'New_random' 将是 Random+diff,否则 'recommand' 列将是 'No' 和 'Random' /'New_random' 将是上一行的随机值。 FYI, we have to update 'Random'/'New_Random' column if 'recommand' is yes for that row and later rows for each id.仅供参考,如果“recommand”对于该行是“是”,我们必须更新“Random”/“New_Random”列,以及每个 id 的后续行。 The expected output should look like this预期的输出应该是这样的
index Random id diff pct recommend Random_new
0 2018-01-01 31 1 3 1 Y 32
1 2018-01-02 31 1 2 2 N 32
2 2018-01-03 31 1 4 0 Y 36
3 2018-01-04 23 2 1 0 Y 24
4 2018-01-05 23 2 6 3 Y 27
5 2018-01-06 23 2 1 1 N 27
6 2018-01-07 51 3 2 5 N 51
7 2018-01-08 51 3 2 0 Y 53
8 2018-01-09 51 3 3 2 Y 56
9 2018-01-10 51 3 1 3 N 56
I have tried np.where which only create the column but don't update the row value for 'Random_new'.我试过 np.where ,它只创建列但不更新“Random_new”的行值。 I feel like I need to create a for loop with if else condition, but could not do it so far.我觉得我需要用 if else 条件创建一个 for 循环,但到目前为止还做不到。 The condition as bullet points:作为要点的条件:
First I'm not sure how you filled your values in your example but shouldn't the first Random_new
be equal to 31+3=34
instead of 32?首先,我不确定您如何在示例中填写您的值,但第一个Random_new
不应该等于31+3=34
而不是 32?
Anyway you can first create you recommend
column (boolean seems better adapted than Y/N) then create the Random_new
with apply (only when recommend
is True) and finally fill ( ffill
) the values when grouped by id
:无论如何,您可以先创建recommend
列(布尔值似乎比 Y/N 更适合),然后使用 apply 创建Random_new
(仅当recommend
为 True 时),最后在按id
分组时填充( ffill
)值:
df['recommend'] = df['pct'] < df['diff']
df['Random_new'] = df.apply(lambda x: x['Random'] + x['diff'] if x['recommend'] else None, axis=1)
df = df.groupby('id').ffill()
Output:输出:
index Random diff pct recommend Random_new
0 2018-01-01 31 3 1 True 34.0
1 2018-01-02 11 2 2 False 34.0
2 2018-01-03 21 4 0 True 25.0
3 2018-01-04 23 1 0 True 24.0
4 2018-01-05 43 6 3 True 49.0
5 2018-01-06 42 1 1 False 49.0
6 2018-01-07 51 2 5 False NaN
7 2018-01-08 47 2 0 True 49.0
8 2018-01-09 49 3 2 True 52.0
9 2018-01-10 22 1 3 False 52.0
Edit: if you wanna keep the id
column replace the last line with:编辑:如果您想保留id
列,请将最后一行替换为:
df = pd.concat([df['id'], df.groupby('id').ffill()], axis=1)
(kwarg as_index=False
doesn't help in this case) (kwarg as_index=False
在这种情况下没有帮助)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.