简体   繁体   中英

Change column values depending a another column value in pandas

I have a dataframe in python such as:

seqnames    start   end name    number  strand
     A       50     453   A      1        -
     B       30     322   A      2        -
     C       10     432   A      3        -
     D       36     344   A      4        +
     E       40     321   A      5        +
     F       78     234   A      6        -

and I would like to change de values in the start and end columns depending on the symbole in the strand column .

So for each line, if the strand is - than do start+1 and end-2 if the strand is + than do nothing

here I should get:

seqnames    start   end name    number  strand
A   51  451 A   1   -
B   31  320 A   2   -
C   11  430 A   3   -
D   36  344 A   4   +
E   40  321 A   5   +
F   79  232 A   6   -

Thank you for your help

Use:

df[['start','end']]=np.where(df['strand'].eq('-')[:,None],
                np.column_stack((df['start']+1,df['end']-2)),
                df[['start','end']].values)
print(df)

  seqnames  start  end name  number strand
0        A     51  451    A       1      -
1        B     31  320    A       2      -
2        C     11  430    A       3      -
3        D     36  344    A       4      +
4        E     40  321    A       5      +
5        F     79  232    A       6      -

Use Series.mask :

df['start'].mask(df['strand']=='-',df['start']+1,inplace=True)
df['end'].mask(df['strand']=='-',df['end']-2,inplace=True)

print(df)

  seqnames  start  end name  number strand
0        A     51  451    A       1      -
1        B     31  320    A       2      -
2        C     11  430    A       3      -
3        D     36  344    A       4      +
4        E     40  321    A       5      +
5        F     79  232    A       6      -

Also you can use DataFrame.apply + DataFrame.where :

df[['start','end']]=( df[['start','end']]
                   .apply(lambda x: pd.Series((x['start']+1,x['end']-2)).rename({0:'start',1:'end'}),axis=1)
                   .where(df['strand']=='-',df[['start','end']])
                    )

print(df)
  seqnames  start  end name  number strand
0        A     51  451    A       1      -
1        B     31  320    A       2      -
2        C     11  430    A       3      -
3        D     36  344    A       4      +
4        E     40  321    A       5      +
5        F     79  232    A       6      -

Use DataFrame.loc :

df.loc[ df['strand'] == '-', ['start', 'end']] += [1, -2]
print (df)
  seqnames  start  end name  number strand
0        A     51  451    A       1      -
1        B     31  320    A       2      -
2        C     11  430    A       3      -
3        D     36  344    A       4      +
4        E     40  321    A       5      +
5        F     79  232    A       6      -

Or use numpy.where for add or subtract values:

m = df['strand'] == '-'
df['start'] = df['start'] + np.where(m, 1 ,0) 
df['end'] =  df['end'] - np.where(m, 2, 0)

Or convert mask to integer and for second value only multiple by 2 :

m = df['strand'] == '-'
df['start'] = df['start'] + m.astype(int)
df['end'] =  df['end'] - m.astype(int) * 2

print (df)
  seqnames  start  end name  number strand
0        A     51  451    A       1      -
1        B     31  320    A       2      -
2        C     11  430    A       3      -
3        D     36  344    A       4      +
4        E     40  321    A       5      +
5        F     79  232    A       6      -

Another one liner could be:

df.loc[ df['strand'] == '-', 'start'] = df.loc[ df['strand'] == '-', 'start'] + 1

which uses loc for indexing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM