[英]How to add new columns to the dataframe based on the column values?
I have a dataframe as follows:我有一个 dataframe 如下:
data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1'],
'POS':[939570,3411794,1043223,22511093,24454031],
'REF':['T', 'T', 'CCT', 'CTT', 'CT'],
'ALT':['TCCCTGGAGGACC', 'C', 'C', 'CT', 'CTT'],
'Len_REF':[1,1,3,3,2], 'Len_ALT':[13,1,1,2,3]
}
df1 = pd.DataFrame(data)
It looks as follows: df1
它看起来如下: df1
CHROM POS REF ALT Len_REF Len_ALT
0 chr1 939570 T TCCCTGGAGGACC 1 13
1 chr2 3411794 T C 1 1
2 chr1 1043223 CCT C 3 1
3 chr3 22511093 CTT CT 3 2
4 chr1 24454031 CT CTT 2 3
I wanted to add new columns to the dataframe based on the column values such that it look as follows:我想根据列值向 dataframe 添加新列,使其如下所示:
Positions Allele Combined
1:939570-939570 CCCTGGAGGACC 1:939570-939570:CCCTGGAGGACC
2:3411794-3411794 C 2:3411794-3411794:C
1:1043223-1043225 - 1:1043223-1043225:-
3:22511093-22511095 - 3:22511093-22511095:-
1:24454031-24454032 T 1:24454031-24454032:T
the df1['Positions']
are generated based on the values in CHROM
& POS
with respect to change in REF
and ALT
. df1['Positions']
是根据CHROM
& POS
中关于REF
和ALT
变化的值生成的。
df1['Allele']
are made using the REF
& ALT
df1['Allele']
使用REF
& ALT
Positions
column: remove non-numerical valiues with \D+
and str.repalce
from CHROM column and manipulate rest of string as desired Positions
列:使用\D+
和str.repalce
从 CHROM 列中删除非数字值,并根据需要操作字符串的 restAllele
column: You can compare ALT
and Len_REF
row-wise and index ALT
based off the Len_REF
value dynamically. Allele
列:您可以逐行比较ALT
和Len_REF
,并根据Len_REF
值动态索引ALT
。 Make sure to pass axis=1
:确保通过axis=1
:df2['Positions'] = (df2['CHROM'].str.replace('\D+', '').astype(str)
+ ':' + df2['POS'].astype(str)
+ '-' + (df2['POS'] + df2['Len_REF'] - 1).astype(str))
df2['Allele'] = df2.apply(lambda x: x['ALT'][x['Len_REF']:], axis=1).replace('','-')
df2['Combined'] = df2['Positions'] + ':' + df2['Allele']
df2.iloc[:,-3:]
Out[1]:
Positions Allele Combined
0 1:939570-939570 CCCTGGAGGACC 1:939570-939570:CCCTGGAGGACC
1 2:3411794-3411794 - 2:3411794-3411794:-
2 1:1043223-1043225 - 1:1043223-1043225:-
3 3:22511093-22511095 - 3:22511093-22511095:-
4 1:24454031-24454032 T 1:24454031-24454032:T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.