[英]conditionally replacing values in a column
I have a pandas dataframe, where the 2nd, 3rd and 6th columns look like so:我有一个 pandas dataframe,其中第 2、3 和 6 列如下所示:
start开始 | end结尾 | strand链 |
---|---|---|
108286 108286 | 108361 108361 | + + |
734546 734546 | 734621 734621 | - - |
761233 761233 | 761309 761309 | + + |
I'm trying to implement a conditional where, if strand is +, then the value in end becomes the equivalent value in start + 1, and if strand is -, then the value in start becomes the value in end, so the output should look like this:我正在尝试实现一个条件,如果 strand 为 +,则 end 中的值变为 start + 1 中的等效值,如果 strand 为 -,则 start 中的值变为 end 中的值,因此 output 应该看起来像这样:
start开始 | end结尾 | strand链 |
---|---|---|
108286 108286 | 108287 108287 | + + |
734620 734620 | 734621 734621 | - - |
761233 761233 | 761234 761234 | + + |
And where the pseudocode may look like this:伪代码可能如下所示:
if df["strand"] == "+":
df["end"] = df["start"] + 1
else:
df["start"] = df["end"] - 1
I imagine this might be best done with loc/iloc
or numpy.where
?我想这可能最好用loc/iloc
或numpy.where
来完成? but I can't seem to get it to work, as always, any help is appreciated!但我似乎无法像往常一样让它工作,我们将不胜感激!
You are correct, loc
is the operator you are looking for你是对的, loc
是你要找的运营商
df.loc[df.strand=='+','end'] = df.loc[df.strand=='+','start']+1
df.loc[df.strand=='-','start'] = df.loc[df.strand=='-','end']-1
You could also use numpy.where
:您还可以使用numpy.where
:
import numpy as np
df[['start', 'end']] = np.where(df[['strand']]=='-', df[['end','end']]-[1,0], df[['start','start']]+[0,1])
Note that this assumes strand
can have one of two values: +
or -
.请注意,这假定strand
可以具有两个值之一: +
或-
。 If it can have any other values, we can use numpy.select
instead.如果它可以有任何其他值,我们可以使用numpy.select
代替。
Output: Output:
start end strand
0 108286 108287 +
1 734620 734621 -
2 761233 761234 +
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.