有条件地替换列中的值

Question

I have a pandas dataframe, where the 2nd, 3rd and 6th columns look like so:我有一个 pandas dataframe，其中第 2、3 和 6 列如下所示：

start开始	end结尾	strand链
108286 108286	108361 108361	+ +
734546 734546	734621 734621	- -
761233 761233	761309 761309	+ +

I'm trying to implement a conditional where, if strand is +, then the value in end becomes the equivalent value in start + 1, and if strand is -, then the value in start becomes the value in end, so the output should look like this:我正在尝试实现一个条件，如果 strand 为 +，则 end 中的值变为 start + 1 中的等效值，如果 strand 为 -，则 start 中的值变为 end 中的值，因此 output 应该看起来像这样：

start开始	end结尾	strand链
108286 108286	108287 108287	+ +
734620 734620	734621 734621	- -
761233 761233	761234 761234	+ +

And where the pseudocode may look like this:伪代码可能如下所示：

if df["strand"] == "+": 
        df["end"] = df["start"] + 1
        
else:
        df["start"] = df["end"] - 1

I imagine this might be best done with loc/iloc or numpy.where ?我想这可能最好用loc/iloc或numpy.where来完成？ but I can't seem to get it to work, as always, any help is appreciated!但我似乎无法像往常一样让它工作，我们将不胜感激！

Answer 1

You are correct, loc is the operator you are looking for你是对的， loc是你要找的运营商

df.loc[df.strand=='+','end'] = df.loc[df.strand=='+','start']+1
df.loc[df.strand=='-','start'] = df.loc[df.strand=='-','end']-1

Answer 2

You could also use numpy.where :您还可以使用numpy.where ：

import numpy as np
df[['start', 'end']] = np.where(df[['strand']]=='-', df[['end','end']]-[1,0], df[['start','start']]+[0,1])

Note that this assumes strand can have one of two values: + or - .请注意，这假定strand可以具有两个值之一： +或- 。 If it can have any other values, we can use numpy.select instead.如果它可以有任何其他值，我们可以使用numpy.select代替。

Output: Output：

    start     end strand
0  108286  108287      +
1  734620  734621      -
2  761233  761234      +

有条件地替换列中的值

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-04-01 16:52:45

解决方案2
3

有条件地替换列中的值

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-04-01 16:52:45

解决方案2 3

解决方案1
3 已采纳 2022-04-01 16:52:45

解决方案2
3