简体   繁体   English

根据另一列中的行信息替换一列中的不同值

[英]Replace different values in one column, according to the row information in another column

I am actually working with genomic data, and I have one data frame, which I am going to show you the first three rows (see table below): 我实际上正在处理基因组数据,并且有一个数据框,下面将向您显示前三行(请参见下表):

Chrom |   POS    |     ID      | REF | ALT | HapA | HapB |
----------------------------------------------------------
 22   | 16495833 | rs116911124 |  A  |  C  |   1  |  0   |
 22   | 19873357 | rs116378360 |  T  |  A  |   0  |  1   |
 22   | 21416404 | rs117982183 |  T  |  T  |   0  |  .   |

So, I would like to replace the values of "0", "1" and "." 因此,我想替换“ 0”,“ 1”和“”的值。 from the "HapA" and "HapB" columns according to the REF and ALT columns for every row in the data frame. 根据数据帧中每一行的REF和ALT列从“ HapA”和“ HapB”列中选择。 For example: 例如:

a) for the first row I want to change the "1" in HapA column for the "C" in the ALT column, and the "0" in the HapB column for the "A" value in the REF column a)对于第一行,我想将ALT列中的“ C”更改为HapA列中的“ 1”,并将REF列中的“ A”值更改为HapB列中的“ 0”

b) for the second row change the "0" for the "T" in the "REF" column and the "1" for the "A" in the "ALT" column. b)对于第二行,将“ REF”列中的“ T”更改为“ 0”,将“ ALT”列中的“ A”更改为“ 1”。

c) And finally, for the "." c)最后,对于“。” change it for "NA" 改成“ NA”

I think that this could be achieved using "if else" or with data.table. 我认为可以使用“ if else”或data.table来实现。

Thank you very much. 非常感谢你。

I think if_else() , recode() , or case_when() could all work for this. 我认为if_else()recode()case_when()都可以做到这一点。 Here I've tried to use mutate_at() to apply the function to both HapA and HapB. 在这里,我尝试使用mutate_at()将函数应用于HapA和HapB。 In case one of the values in those columns is not equal to 1,0, or . 如果这些列中的值之一不等于1,0或。 then the function should return the value as a character string. 然后该函数应以字符串形式返回该值。

mutate_at(df, vars(HapA, HapB),
    function(x) {case_when(x == 1 ~ .$ALT,
                     x == 0 ~ .$REF,
                     x == . ~ NA_character_,
                     TRUE ~ as.character(x)) } )

It's a bit unclear what you want exactly, since you don't specify what should happen to the 0 in the third row of the HapA column, but given what you said, this is a dplyr solution: 尚不清楚您到底想要什么,因为您没有在HapA列的第三行中指定应对0发生什么,但是鉴于您的dplyr ,这是一种dplyr解决方案:

library(dplyr)

df <- read.table(text = "
'Chrom'     'POS'      'ID'       'REF'  'ALT' 'HapA' 'HapB'
22     16495833   'rs116911124'    'A'     'C'      1     0  
22     19873357   'rs116378360'    'T'     'A'      0     1  
22     21416404   'rs117982183'    'T'     'T'      0     .", header = T, stringsAsFactors = F)

df %>%
  mutate(HapA = ifelse(HapA == 1, ALT, ifelse(HapA == 0, REF, NA)),
         HapB = ifelse(HapB == 1, ALT, ifelse(HapB == 0, REF, NA)))

##   Chrom      POS          ID REF ALT HapA HapB
## 1    22 16495833 rs116911124   A   C    C    A
## 2    22 19873357 rs116378360   T   A    T    A
## 3    22 21416404 rs117982183   T   T    T <NA>

There wasn't really a question, but I'm going to guess what it was: 确实没有问题,但是我要猜​​测是什么:

How can I replace the values of HapA and HapB following these rules: 如何HapB以下规则替换HapAHapB的值:

  1. If "0" , then replace with the value of REF . 如果为"0" ,则替换为REF的值。
  2. If "1" , then replace with the value of ALT . 如果为"1" ,则替换为ALT的值。
  3. If "." 如果是"." , then replace with NA . ,然后替换为NA

Note that I'm also assuming HapA and HapB are character columns, since . 请注意,由于,我还假设HapAHapB是字符列. can't be a numeric value. 不能是数字值。

If this is the right interpretation, there's no need to use fancy tricks. 如果这是正确的解释,则无需使用任何花哨的技巧。 This is an "if-else" problem. 这是一个“ if-else”问题。 Here's a solution using data.table , which I think is common in genomic analysis. 这是使用data.table的解决方案,我认为这在基因组分析中很常见。 First I'll create the example dataset: 首先,我将创建示例数据集:

library(data.table)

dt <- fread(
  header = TRUE,
  colClasses = c(
    Chrom = "character",
    POS   = "integer",
    ID    = "character",
    REF   = "character",
    ALT   = "character",
    HapA  = "character",
    HapB  = "character"
  ),
  input = "
Chrom  POS        ID               REF     ALT      HapA HapB
22     16495833   'rs116911124'    'A'     'C'      1     0  
22     19873357   'rs116378360'    'T'     'A'      0     1  
22     21416404   'rs117982183'    'T'     'T'      0     ."
)
dt
#    Chrom      POS            ID REF ALT HapA HapB
# 1:    22 16495833 'rs116911124' 'A' 'C'    1    0
# 2:    22 19873357 'rs116378360' 'T' 'A'    0    1
# 3:    22 21416404 'rs117982183' 'T' 'T'    0    .

That was the long part. 那是很长的部分。 Here's the short part. 这是简短的部分。

dt[HapA == "0", HapA := REF]
dt[HapA == "1", HapA := ALT]
dt[HapA == ".", HapA := NA]
dt[HapB == "0", HapB := REF]
dt[HapB == "1", HapB := ALT]
dt[HapB == ".", HapB := NA]
dt
#    Chrom      POS            ID REF ALT HapA HapB
# 1:    22 16495833 'rs116911124' 'A' 'C'  'C'  'A'
# 2:    22 19873357 'rs116378360' 'T' 'A'  'T'  'A'
# 3:    22 21416404 'rs117982183' 'T' 'T'  'T'   NA

I strongly suggest writing this out in a simple way, like the above. 强烈建议像上面一样以一种简单的方式写出来。 It's short, has little repetition, and is easily understood at a glance. 它很短,几乎没有重复,一目了然。 However, if you'd want to generalize this to a lot of columns, that would require writing a lot of repetitive lines. 但是,如果您要将其概括为很多列,则需要编写很多重复的行。 So here's a loop version: 所以这是一个循环版本:

replaced_columns <- c("HapA", "HapB")  # Switch these out for any
source_columns   <- c("REF", "ALT")    # number of columns

for (rr in replaced_columns) {
  for (source_i in seq_along(source_columns)) {
    target_rows <- which(dt[[rr]] == source_i - 1)
    dt[
      target_rows,
      (rr) := .SD,
      .SDcols = source_columns[source_i]
    ]
  }
}

dt
#    Chrom      POS            ID REF ALT HapA HapB
# 1:    22 16495833 'rs116911124' 'A' 'C'  'C'  'A'
# 2:    22 19873357 'rs116378360' 'T' 'A'  'T'  'A'
# 3:    22 21416404 'rs117982183' 'T' 'T'  'T'    .

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据行号替换 dataframe 中列的值 - Replace the values of column in a dataframe according to row number 使用另一数据框的一行中的值替换一个数据框的一列中的所有值(按行名和列名匹配) - Replace all values in a column of one dataframe using values in a row of another dataframe (matching by row name and column name) 根据r中的ID从一列中查找另一列中的值 - Find values from one column in another column according to ID in r 根据另一列中的值对一列中的特定级别进行排序 - Sort out specific levels in one column according to the values in another column 根据R中特定列中的值替换所有行值 - Replace all row values according to a value in a specific column in R R 根据一个数据集中的列信息/条件将行值替换为其他行? - R Replace row values to the other rows based on/condition on column information in one data set? 根据另一列中的共识值替换列中的 NAs 值 - Replace NAs values within a column according to consensus value in another 将一列中的 NA 替换为 dplyr 中另一列的值 - Replace NAs in one column with the values of another in dplyr 使用另一个数据帧的行中的值替换一个数据帧的列中的所有值(按行名称和列名称匹配),替换为字符 - Replace all values in column of one dataframe using values in row of another dataframe (matching by row name&column name), replacement is characters 如何将一列中的NA值替换为另一列中的值? - How to replace NA values in one column with values from another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM