[英]combining columns of a df into a single column based on values of another column in R?
我正在嘗試根據我的df
中的輔助列的值將我的df
的多個列組合成一個列。 這是我的df
:
STA_AA INT_AA1 INT_AA2 END_AA POSITION Step
1 A S C <NA> 221 1
2 A G C <NA> 221 2
3 A S C C 221 3
4 A S S C 221 4
5 A A S C 221 1
6 A A G C 221 2
我要做的是創建一個新列,標題為AA
,其中包含字符。 具體來說,這就是我想要的:
Step
col 包含 1,則將同一行STA_AA
中的字符放入AA
Step
col 包含 2,則將同一行中的INT_AA1
中的字符放入AA
Step
col 包含 3,則將INT_AA2
中的字符放在同一行中,放入AA
Step
col 包含 3,則將INT_AA2
中的字符放在同一行中,放入AA
這是我理想的最終輸出的樣子:
POSITION Step AA
1 221 1 A
2 221 2 G
3 221 3 C
4 221 4 C
5 221 1 A
6 221 2 A
這是我在這里使用foreach
循環的解決方案:
df <- df %>% mutate(AA = NA)
foreach(j = 1:nrow(df)) %do% {
if (df$Step[j] == 1) {
df$AA[j] <- df$STA_AA[j]
}
if (df$Step[j] == 2) {
df$AA[j] <- df$INT_AA1[j]
}
if (df$Step[j] == 3) {
df$AA[j] <- df$INT_AA2[j]
}
if (df$Step[j] == 4) {
df$AA[j] <- df$END_AA[j]
}
}
df <- df %>% select(-STA_AA, -INT_AA1, -INT_AA2, -END_AA)
我的問題是:有沒有人有比我的循環解決方案更短的AA
列解決方案? 理想情況下,它將以某種方式組合列以產生所需的輸出,而不是遍歷它們並將特定值保存到新列中
這是我想要的輸入和輸出:
#INPUT
df <- structure(list(STA_AA = c("A", "A", "A", "A", "A", "A"), INT_AA1 = c("S",
"G", "S", "S", "A", "A"), INT_AA2 = c("C", "C", "C", "S", "S",
"G"), END_AA = c(NA, NA, "C", "C", "C", "C"), POSITION = c(221L,
221L, 221L, 221L, 221L, 221L), Step = c(1, 2, 3, 4, 1, 2), AA = c("A",
"G", "C", "C", "A", "A")), row.names = c(NA, 6L), class = "data.frame")
#OUTPUT
op <- structure(list(POSITION = c(221L, 221L, 221L, 221L, 221L, 221L
), Step = c(1, 2, 3, 4, 1, 2), AA = c("A", "G", "C", "C", "A",
"A")), row.names = c(NA, 6L), class = "data.frame")
我正在使用的相關軟件包: tidyverse
, foreach
我的會話信息:
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] bit_4.0.4 reshape2_1.4.4 summarytools_1.0.1 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2
[7] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[13] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lubridate_1.8.0 assertthat_0.2.1 digest_0.6.29 utf8_1.2.2
[6] plyr_1.8.7 R6_2.5.1 cellranger_1.1.0 backports_1.4.1 reprex_2.0.1
[11] httr_1.4.3 pillar_1.8.0 rlang_1.0.4 googlesheets4_1.0.0 readxl_1.4.0
[16] rstudioapi_0.13 magick_2.7.3 checkmate_2.1.0 labeling_0.4.2 googledrive_2.0.0
[21] pander_0.6.5 munsell_0.5.0 broom_1.0.0 compiler_4.1.3 modelr_0.1.8
[26] pkgconfig_2.0.3 base64enc_0.1-3 tcltk_4.1.3 htmltools_0.5.3 tidyselect_1.1.2
[31] codetools_0.2-18 matrixStats_0.62.0 fansi_1.0.3 crayon_1.5.1 tzdb_0.3.0
[36] dbplyr_2.2.1 withr_2.5.0 MASS_7.3-58 grid_4.1.3 jsonlite_1.8.0
[41] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.3 magrittr_2.0.3 scales_1.2.0
[46] vroom_1.5.7 cli_3.3.0 stringi_1.7.8 farver_2.1.1 pryr_0.1.5
[51] fs_1.5.2 xml2_1.3.3 rapportools_1.1 ellipsis_0.3.2 generics_0.1.3
[56] vctrs_0.4.1 tools_4.1.3 bit64_4.0.5 glue_1.6.2 hms_1.1.1
[61] pkgload_1.3.0 fastmap_1.1.0 colorspace_2.0-3 gargle_1.2.0 rvest_1.0.2
[66] haven_2.5.0
嘗試這個
df |>
rowwise() |>
mutate(END_AA = c_across(as.integer(Step))) |>
ungroup() |>
select(POSITION, Step , AA = END_AA)
# A tibble: 6 × 3
POSITION Step AA
<int> <dbl> <chr>
1 221 1 A
2 221 2 G
3 221 3 C
4 221 4 C
5 221 1 A
6 221 2 A
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.