繁体   English   中英

根据R中另一列的值将df的列组合成一列?

[英]combining columns of a df into a single column based on values of another column in R?

我正在尝试根据我的df中的辅助列的值将我的df的多个列组合成一个列。 这是我的df

  STA_AA INT_AA1 INT_AA2 END_AA POSITION Step
1      A       S       C   <NA>      221    1
2      A       G       C   <NA>      221    2
3      A       S       C      C      221    3
4      A       S       S      C      221    4
5      A       A       S      C      221    1
6      A       A       G      C      221    2

我要做的是创建一个新列,标题为AA ,其中包含字符。 具体来说,这就是我想要的:

  • 如果Step col 包含 1,则将同一行STA_AA中的字符放入AA
  • 如果Step col 包含 2,则将同一行中的INT_AA1中的字符放入AA
  • 如果Step col 包含 3,则将INT_AA2中的字符放在同一行中,放入AA
  • 如果Step col 包含 3,则将INT_AA2中的字符放在同一行中,放入AA

这是我理想的最终输出的样子:

  POSITION Step AA
1      221    1  A
2      221    2  G
3      221    3  C
4      221    4  C
5      221    1  A
6      221    2  A

这是我在这里使用foreach循环的解决方案:

df <- df %>% mutate(AA = NA)
foreach(j = 1:nrow(df)) %do% {
  if (df$Step[j] == 1) {
    df$AA[j] <- df$STA_AA[j]
  }
  if (df$Step[j] == 2) {
    df$AA[j] <- df$INT_AA1[j]
  }
  if (df$Step[j] == 3) {
    df$AA[j] <- df$INT_AA2[j]
  }
  if (df$Step[j] == 4) {
    df$AA[j] <- df$END_AA[j]
  }
}
df <- df %>% select(-STA_AA, -INT_AA1, -INT_AA2, -END_AA)

我的问题是:有没有人有比我的循环解决方案更短的AA列解决方案? 理想情况下,它将以某种方式组合列以产生所需的输出,而不是遍历它们并将特定值保存到新列中

这是我想要的输入和输出:

#INPUT
df <- structure(list(STA_AA = c("A", "A", "A", "A", "A", "A"), INT_AA1 = c("S", 
"G", "S", "S", "A", "A"), INT_AA2 = c("C", "C", "C", "S", "S", 
"G"), END_AA = c(NA, NA, "C", "C", "C", "C"), POSITION = c(221L, 
221L, 221L, 221L, 221L, 221L), Step = c(1, 2, 3, 4, 1, 2), AA = c("A", 
"G", "C", "C", "A", "A")), row.names = c(NA, 6L), class = "data.frame")

#OUTPUT
op <- structure(list(POSITION = c(221L, 221L, 221L, 221L, 221L, 221L
), Step = c(1, 2, 3, 4, 1, 2), AA = c("A", "G", "C", "C", "A", 
"A")), row.names = c(NA, 6L), class = "data.frame")

我正在使用的相关软件包: tidyverseforeach

我的会话信息:

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bit_4.0.4          reshape2_1.4.4     summarytools_1.0.1 doParallel_1.0.17  iterators_1.0.14   foreach_1.5.2     
 [7] forcats_0.5.1      stringr_1.4.0      dplyr_1.0.9        purrr_0.3.4        readr_2.1.2        tidyr_1.2.0       
[13] tibble_3.1.7       ggplot2_3.3.6      tidyverse_1.3.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9          lubridate_1.8.0     assertthat_0.2.1    digest_0.6.29       utf8_1.2.2         
 [6] plyr_1.8.7          R6_2.5.1            cellranger_1.1.0    backports_1.4.1     reprex_2.0.1       
[11] httr_1.4.3          pillar_1.8.0        rlang_1.0.4         googlesheets4_1.0.0 readxl_1.4.0       
[16] rstudioapi_0.13     magick_2.7.3        checkmate_2.1.0     labeling_0.4.2      googledrive_2.0.0  
[21] pander_0.6.5        munsell_0.5.0       broom_1.0.0         compiler_4.1.3      modelr_0.1.8       
[26] pkgconfig_2.0.3     base64enc_0.1-3     tcltk_4.1.3         htmltools_0.5.3     tidyselect_1.1.2   
[31] codetools_0.2-18    matrixStats_0.62.0  fansi_1.0.3         crayon_1.5.1        tzdb_0.3.0         
[36] dbplyr_2.2.1        withr_2.5.0         MASS_7.3-58         grid_4.1.3          jsonlite_1.8.0     
[41] gtable_0.3.0        lifecycle_1.0.1     DBI_1.1.3           magrittr_2.0.3      scales_1.2.0       
[46] vroom_1.5.7         cli_3.3.0           stringi_1.7.8       farver_2.1.1        pryr_0.1.5         
[51] fs_1.5.2            xml2_1.3.3          rapportools_1.1     ellipsis_0.3.2      generics_0.1.3     
[56] vctrs_0.4.1         tools_4.1.3         bit64_4.0.5         glue_1.6.2          hms_1.1.1          
[61] pkgload_1.3.0       fastmap_1.1.0       colorspace_2.0-3    gargle_1.2.0        rvest_1.0.2        
[66] haven_2.5.0        

尝试这个

df |> 
   rowwise() |> 
   mutate(END_AA = c_across(as.integer(Step))) |> 
   ungroup() |>
   select(POSITION, Step , AA = END_AA)
  • 输出
# A tibble: 6 × 3
  POSITION  Step AA   
     <int> <dbl> <chr>
1      221     1 A    
2      221     2 G    
3      221     3 C    
4      221     4 C    
5      221     1 A    
6      221     2 A    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM