[英]R Megre Data Frame Column and Recode
我有 2 個 R 數據幀,如下所示:
數據幀 1:
標識符 | ef_posterior | position_no | 分類 |
---|---|---|---|
11111 | 0.260 | 1 | 是的 |
11111 | 0.0822 | 2 | 是的 |
11111 | 0.00797 | 3 | 是的 |
11111 | 0.04 | 4 | 不 |
11111 | 0.245 | 5 | 是的 |
11111 | 0.432 | 6 | 是的 |
11112 | 0.342 | 1 | 也許 |
11112 | 0.453 | 2 | 是的 |
11112 | 0.0032 | 3 | 是的 |
11112 | 0.241 | 5 | 不 |
11112 | 0.0422 | 6 | 是的 |
11112 | 0.311 | 4 | 不 |
DATAFRAME 2:
study_identifier | %LVEF |
---|---|
11111 | 62 |
11112 | 76 |
我想將這兩個數據框合並並重新排列成這樣的:
Study_identifier 和 identifier 是同一個東西(只是不同的列名)。 另外,我想重新編碼分類,使yes = 0,no = 1,maybe = 2
標識符 | pos_1 | pos_1_class | pos_2 | pos_2_class | pos_3 | pos_3_class | pos_4 | pos_4_class | pos_5 | pos_5_class | pos_6 | pos_6_class | %LVEF |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
11111 | 0.260 | 0 | 0.0822 | 0 | 0.00797 | 0 | 0.04 | 1 | 0.245 | 0 | 0.432 | 0 | 62 |
11112 | 0.342 | 2 | 0.453 | 0 | 0.0032 | 0 | 0.311 | 1 | 0.241 | 1 | 0.0422 | 0 | 76 |
df1 %>% mutate(position_no = paste0("position_", position_no)) %>%
pivot_wider(id_cols = identifier, names_from = position_no, values_from = ef_posterior) %>%
left_join(df2 %>% mutate(study_identifier = as.numeric(as.character(study_identifier))), by = c("identifier" = "study_identifier"))
這是我現在擁有的代碼,但我不知道在哪里放置分類列的代碼
我將如何 go 這樣做? 任何幫助將不勝感激!
您可以使用dplyr
和case_when
輕松重新編碼:
df1 %>% mutate(
classification =
case_when( classification == "yes" ~ 1,
classification == "no" ~ 0,
classification == "maybe" ~ 2)
)
我會通過以下方式解決它:
library(tidyverse)
df1 <- data.frame(
stringsAsFactors = FALSE,
identifier = c(11111L,11111L,11111L,11111L,
11111L,11111L,11112L,11112L,11112L,11112L,11112L,
11112L),
ef_posterior = c(0.26,0.0822,0.00797,0.04,
0.245,0.432,0.342,0.453,0.0032,0.241,0.0422,0.311),
position_no = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 4L),
classification = c("yes","yes","yes","no",
"yes","yes","maybe","yes","yes","no","yes","no")
)
df2 <- data.frame(
check.names = FALSE,
study_identifier = c(11111L, 11112L),
`%LVEF` = c(62L, 76L)
)
df1 %>% mutate(
classification =
case_when( classification == "yes" ~ 1,
classification == "no" ~ 0,
classification == "maybe" ~ 2)
) %>%
pivot_wider(
id_cols = c(identifier), names_from = c(position_no), values_from = c(classification,ef_posterior)) %>%
left_join(df2, by = c("identifier" = "study_identifier"))
#> # A tibble: 2 x 14
#> identifier classification_1 classification_2 classification_3 classification_4
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 11111 1 1 1 0
#> 2 11112 2 1 1 0
#> # … with 9 more variables: classification_5 <dbl>, classification_6 <dbl>,
#> # ef_posterior_1 <dbl>, ef_posterior_2 <dbl>, ef_posterior_3 <dbl>,
#> # ef_posterior_4 <dbl>, ef_posterior_5 <dbl>, ef_posterior_6 <dbl>,
#> # `%LVEF` <int>
由代表 package (v0.3.0) 於 2021 年 4 月 12 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.