[英]combining columns of a df into a single column based on values of another column in R?

我正在尝试根据我的df中的辅助列的值将我的df的多个列组合成一个列。 这是我的df

1      A       S       C   <NA>      221    1
2      A       G       C   <NA>      221    2
3      A       S       C      C      221    3
4      A       S       S      C      221    4
5      A       A       S      C      221    1
6      A       A       G      C      221    2

我要做的是创建一个新列,标题为AA ,其中包含字符。 具体来说,这就是我想要的:

  • 如果Step col 包含 1,则将同一行STA_AA中的字符放入AA
  • 如果Step col 包含 2,则将同一行中的INT_AA1中的字符放入AA
  • 如果Step col 包含 3,则将INT_AA2中的字符放在同一行中,放入AA
1      221    1  A
2      221    2  G
3      221    3  C
4      221    4  C
5      221    1  A
6      221    2  A


df <- df %>% mutate(AA = NA)
foreach(j = 1:nrow(df)) %do% {
  if (df$Step[j] == 1) {
    df$AA[j] <- df$STA_AA[j]
  if (df$Step[j] == 2) {
    df$AA[j] <- df$INT_AA1[j]
  if (df$Step[j] == 3) {
    df$AA[j] <- df$INT_AA2[j]
  if (df$Step[j] == 4) {
    df$AA[j] <- df$END_AA[j]
df <- df %>% select(-STA_AA, -INT_AA1, -INT_AA2, -END_AA)

我的问题是:有没有人有比我的循环解决方案更短的AA列解决方案? 理想情况下,它将以某种方式组合列以产生所需的输出,而不是遍历它们并将特定值保存到新列中


df <- structure(list(STA_AA = c("A", "A", "A", "A", "A", "A"), INT_AA1 = c("S", 
"G", "S", "S", "A", "A"), INT_AA2 = c("C", "C", "C", "S", "S", 
"G"), END_AA = c(NA, NA, "C", "C", "C", "C"), POSITION = c(221L, 
221L, 221L, 221L, 221L, 221L), Step = c(1, 2, 3, 4, 1, 2), AA = c("A", 
"G", "C", "C", "A", "A")), row.names = c(NA, 6L), class = "data.frame")

op <- structure(list(POSITION = c(221L, 221L, 221L, 221L, 221L, 221L
), Step = c(1, 2, 3, 4, 1, 2), AA = c("A", "G", "C", "C", "A", 
"A")), row.names = c(NA, 6L), class = "data.frame")

我正在使用的相关软件包: tidyverseforeach


df |> 
   rowwise() |> 
   mutate(END_AA = c_across(as.integer(Step))) |> 
   ungroup() |>
   select(POSITION, Step , AA = END_AA)
  • 输出
# A tibble: 6 × 3
  POSITION  Step AA   
     <int> <dbl> <chr>
1      221     1 A    
2      221     2 G    
3      221     3 C    
4      221     4 C    
5      221     1 A    
6      221     2 A    


