简体   繁体   English

如何将行聚合到具有三个级别的因子变量?

[英]How do you aggregate rows to a factor variable with three levels?

I have a dataset where some participants have multiple rows and I need to aggregate the data in a way that every participant has only one row.我有一个数据集,其中一些参与者有多行,我需要以每个参与者只有一行的方式聚合数据。 The dataset contains different variable types (eg, factors, date, age etc.) I have made a code that works and looks like this:数据集包含不同的变量类型(例如,因子、日期、年龄等)我编写了一个有效的代码,如下所示:

example4 <- SMARTdata_50j_diagc_2016  %>% 
  group_by( Patient_Id ) %>%  
  summarise( Groep = first( Groep ),
             Ziekenhuis_Nr = first( Ziekenhuis_Nr ),
             Ziekenhuistype = first( Ziekenhuistype ),
             aantalDBC = n(),
             aantalVervolg = sum( as.numeric( Zorgtype_Code ) ),
             Leeftijd = mean( Lft_patient_openenDBC ),
             MRI_nee_ja = max( ifelse( MRI_nee_ja == 0, 0, 1 ) ),
             aantalMRI = sum( MRI_Aantal ),
             Artroscopie_nee_ja = max( ifelse( Artroscopie_nee_jaz_jam == 0, 0, 1 ) ),
             aantalArtroscopie = sum( Artroscopie_aantal ),
             overigDBC = mean( Aantal_overigeDBC_bijopenen ),
             DBC_open = min( open_DBC ), 
             DBC_sluiten = max( sluiten_DBC ) ) %>% 
  as.data.frame()

This code gives me a single row for each participant.此代码为每个参与者提供了一行。 However, I have one more variable that I need to include in the new dataframe, but I do not know how to do that.但是,我还有一个变量需要包含在新数据框中,但我不知道该怎么做。 The variable that I need to add is called 'Diagnose_Code' and is factor with two levels, namely 0 (standing for 1801) and 1 (standing for 1805).我需要添加的变量称为“Diagnose_Code”,它是具有两个级别的因子,即 0(代表 1801)和 1(代表 1805)。

For the participants that have multiple rows (in the original dataframe), there are participants that have both a 0 and a 1 for that variable.对于具有多行(在原始数据帧中)的参与者,该变量同时具有 0 和 1 的参与者。 Now, in my new dataframe, I want to make a variable for 'Diagnose_Code' with three levels: 0 for if all rows of that participant are 0, 1 for if all rows of that participant are 1, and 2 for if the rows of that participant have both a 0 and a 1.现在,在我的新数据框中,我想为“Diagnose_Code”创建一个具有三个级别的变量:0 表示该参与者的所有行均为 0,1 表示该参与者的所有行均为 1,2 表示该参与者的所有行该参与者同时拥有 0 和 1。

I do not know how to make this work.我不知道如何进行这项工作。 I struggled a bit with the ifelse code, but without success.我对 ifelse 代码有点挣扎,但没有成功。 Does anyone know how I can make this work in my code?有谁知道如何在我的代码中完成这项工作? Thank you in advance!先感谢您!

Using a toy dataset this can be achieved like so:使用玩具数据集可以这样实现:

library(dplyr)

df <- data.frame(
  id = rep(1:3, each = 3),
  diagnosis_code = c(rep(1,3), rep(0, 3), c(1, 0, 1)),
  stringsAsFactors = FALSE
)
df %>% 
  group_by(id) %>% 
  summarise(diagnosis_code = case_when(
    all(diagnosis_code == 1) ~ 1,
    all(diagnosis_code == 0) ~ 0,
    TRUE ~ 2
  ))
#> # A tibble: 3 x 2
#>      id diagnosis_code
#>   <int>          <dbl>
#> 1     1              1
#> 2     2              0
#> 3     3              2

Created on 2020-03-29 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 3 月 29 日创建

Using ifelse should work:使用 ifelse 应该可以工作:

df %>%
group_by(id) %>%
  summarise(diag=ifelse(max(diag)!=min(diag), 2, 
                          ifelse(max(diag==1), 1, 0)))

# A tibble: 3 x 2
     id  diag
  <dbl> <dbl>
1     1     2
2     2     1
3     3     0

Data :数据

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), diag=c(1,0,0,1,1,1,0,0,0))
df %>% 
  group_by(Patient_Id) %>% 
  summarise(Diagnose_Code = case_when(n_distinct(Diagnose_Code) == 2 ~ 3, 
                                      sum(Diagnose_Code) ==  1 ~ 1, 
                                      TRUE ~ 0 ))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM