简体   繁体   English

如何根据列值向 dataframe 添加新列?

[英]How to add new columns to the dataframe based on the column values?

I have a dataframe as follows:我有一个 dataframe 如下:

data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1'],
        'POS':[939570,3411794,1043223,22511093,24454031],
        'REF':['T', 'T', 'CCT', 'CTT', 'CT'],
        'ALT':['TCCCTGGAGGACC', 'C', 'C', 'CT', 'CTT'],
        'Len_REF':[1,1,3,3,2], 'Len_ALT':[13,1,1,2,3]
       }
df1 = pd.DataFrame(data)

It looks as follows: df1它看起来如下: df1

    CHROM   POS     REF  ALT            Len_REF   Len_ALT
0   chr1    939570   T   TCCCTGGAGGACC    1         13
1   chr2    3411794  T   C                1          1
2   chr1    1043223  CCT C                3          1
3   chr3    22511093 CTT CT               3          2
4   chr1    24454031 CT  CTT              2          3

I wanted to add new columns to the dataframe based on the column values such that it look as follows:我想根据列值向 dataframe 添加新列,使其如下所示:

Positions             Allele         Combined
1:939570-939570       CCCTGGAGGACC   1:939570-939570:CCCTGGAGGACC
2:3411794-3411794     C              2:3411794-3411794:C
1:1043223-1043225     -              1:1043223-1043225:-
3:22511093-22511095   -              3:22511093-22511095:-
1:24454031-24454032   T             1:24454031-24454032:T

the df1['Positions'] are generated based on the values in CHROM & POS with respect to change in REF and ALT . df1['Positions']是根据CHROM & POS中关于REFALT变化的值生成的。

df1['Allele'] are made using the REF & ALT df1['Allele']使用REF & ALT

  1. Positions column: remove non-numerical valiues with \D+ and str.repalce from CHROM column and manipulate rest of string as desired Positions列:使用\D+str.repalce从 CHROM 列中删除非数字值,并根据需要操作字符串的 rest
  2. Allele column: You can compare ALT and Len_REF row-wise and index ALT based off the Len_REF value dynamically. Allele列:您可以逐行比较ALTLen_REF ,并根据Len_REF值动态索引ALT Make sure to pass axis=1 :确保通过axis=1

df2['Positions'] =  (df2['CHROM'].str.replace('\D+', '').astype(str)
                     + ':' + df2['POS'].astype(str) 
                     + '-' + (df2['POS'] + df2['Len_REF'] - 1).astype(str))
df2['Allele'] = df2.apply(lambda x: x['ALT'][x['Len_REF']:], axis=1).replace('','-')
df2['Combined'] = df2['Positions'] + ':' + df2['Allele']
df2.iloc[:,-3:]

Out[1]: 
             Positions        Allele                      Combined
0      1:939570-939570  CCCTGGAGGACC  1:939570-939570:CCCTGGAGGACC
1    2:3411794-3411794             -           2:3411794-3411794:-
2    1:1043223-1043225             -           1:1043223-1043225:-
3  3:22511093-22511095             -         3:22511093-22511095:-
4  1:24454031-24454032             T         1:24454031-24454032:T

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据其他两列中的值在 pandas dataframe 中添加新列 - How to add new column in pandas dataframe based on values in two other columns Pandas 如何根据所有行的值、应用于整个数据帧的特定列值向数据帧添加新列 - Pandas how add a new column to dataframe based on values from all rows, specific columns values applied to whole dataframe 如何在 dataframe 的中间添加一个新列,其值基于上一列? - How to add a new column in the middle of the dataframe with values based on the previous column? 如何根据其他列的值在数据框中创建新列? - How to create a new column in a dataframe based off values of other columns? 如何根据不同列中的值向 pandas dataframe 添加一列? - How to add one column to pandas dataframe based on values in different columns? 如何根据列值在现有数据框中添加新行? - How to add new rows in existing dataframe based on the column values? 如何基于其他现有列的条件添加具有值的新列? - How to add a new column with values based on conditions of other existing columns? 如何使用另一个数据框添加数据框并基于列添加公共列值? - how to add a dataframe with another dataframe and add common columns values based on a column? Pandas:根据列中的值向DataFrame添加新列 - Pandas: Add new columns to DataFrame based on values in columns 基于2个现有列的值将新列分配(添加)到dask数据帧 - 涉及条件语句 - Assign (add) a new column to a dask dataframe based on values of 2 existing columns - involves a conditional statement
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM