R Megre 數據幀列和重新編碼

Question

我有 2 個 R 數據幀，如下所示：

數據幀 1：

標識符	ef_posterior	position_no	分類
11111	0.260	1	是的
11111	0.0822	2	是的
11111	0.00797	3	是的
11111	0.04	4	不
11111	0.245	5	是的
11111	0.432	6	是的
11112	0.342	1	也許
11112	0.453	2	是的
11112	0.0032	3	是的
11112	0.241	5	不
11112	0.0422	6	是的
11112	0.311	4	不

DATAFRAME 2：

study_identifier	%LVEF
11111	62
11112	76

我想將這兩個數據框合並並重新排列成這樣的：

Study_identifier 和 identifier 是同一個東西（只是不同的列名）。 另外，我想重新編碼分類，使yes = 0，no = 1，maybe = 2

標識符	pos_1	pos_1_class	pos_2	pos_2_class	pos_3	pos_3_class	pos_4	pos_4_class	pos_5	pos_5_class	pos_6	pos_6_class	%LVEF
11111	0.260	0	0.0822	0	0.00797	0	0.04	1	0.245	0	0.432	0	62
11112	0.342	2	0.453	0	0.0032	0	0.311	1	0.241	1	0.0422	0	76

df1 %>% mutate(position_no = paste0("position_", position_no)) %>%
  pivot_wider(id_cols = identifier, names_from = position_no, values_from = ef_posterior) %>%
  left_join(df2 %>% mutate(study_identifier = as.numeric(as.character(study_identifier))), by = c("identifier" = "study_identifier"))

這是我現在擁有的代碼，但我不知道在哪里放置分類列的代碼

我將如何 go 這樣做？ 任何幫助將不勝感激！

Answer 1

您可以使用dplyr和case_when輕松重新編碼：

df1 %>% mutate(
  classification = 
    case_when( classification == "yes" ~ 1,
               classification == "no" ~ 0,
               classification == "maybe" ~ 2)
)

我會通過以下方式解決它：

library(tidyverse)
df1 <- data.frame(
  stringsAsFactors = FALSE,
  identifier = c(11111L,11111L,11111L,11111L,
                 11111L,11111L,11112L,11112L,11112L,11112L,11112L,
                 11112L),
  ef_posterior = c(0.26,0.0822,0.00797,0.04,
                   0.245,0.432,0.342,0.453,0.0032,0.241,0.0422,0.311),
  position_no = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 4L),
  classification = c("yes","yes","yes","no",
                     "yes","yes","maybe","yes","yes","no","yes","no")
)

df2 <- data.frame(
  check.names = FALSE,
  study_identifier = c(11111L, 11112L),
  `%LVEF` = c(62L, 76L)
)

df1 %>% mutate(
  classification = 
    case_when( classification == "yes" ~ 1,
               classification == "no" ~ 0,
               classification == "maybe" ~ 2)
) %>% 
  pivot_wider(
    id_cols = c(identifier), names_from = c(position_no), values_from = c(classification,ef_posterior)) %>% 
left_join(df2, by = c("identifier" = "study_identifier"))
#> # A tibble: 2 x 14
#>   identifier classification_1 classification_2 classification_3 classification_4
#>        <int>            <dbl>            <dbl>            <dbl>            <dbl>
#> 1      11111                1                1                1                0
#> 2      11112                2                1                1                0
#> # … with 9 more variables: classification_5 <dbl>, classification_6 <dbl>,
#> #   ef_posterior_1 <dbl>, ef_posterior_2 <dbl>, ef_posterior_3 <dbl>,
#> #   ef_posterior_4 <dbl>, ef_posterior_5 <dbl>, ef_posterior_6 <dbl>,
#> #   `%LVEF` <int>

由代表 package (v0.3.0) 於 2021 年 4 月 12 日創建

R Megre 數據幀列和重新編碼

問題描述

1 個解決方案

解決方案1
1 已采納 2021-04-12 07:28:51

R Megre 數據幀列和重新編碼

問題描述

1 個解決方案

解決方案1 1 已采納 2021-04-12 07:28:51

解決方案1
1 已采納 2021-04-12 07:28:51