简体   繁体   中英

How do I combine multiple rows into one using R?

I have a large dataset that contains data about patients. Some patients have multiple rows and I want to combine these rows, so that each patient has one row.

I have about 20 different variables. Some variables need to stay the same when combining rows (eg, a patient with 4 rows that is in group 1, should still be in group 1 when the rows are combined), but I have also variables that have to meet a certain condition (eg, if a patient had surgery in one (or multiple) of the rows, it should become a 'yes'. If not, it should become a 'no').

I have tried searching for the answer, but I am confused. I tried using plyr, but it seems that using this function is not recommended, as it becomes slow with very large datasets. I have found some information about dplyr, but I am not understanding how I should use this.

So for example, I have the following dataset (my apologies for how I present this, I am new to Stackoverflow)

**Patient_Id** /**Group** /**Age** /**Gender** /**surgery y/n** /**no of surgeries** 

1 - 1 - 63 - F - no - 0      

1 - 1 - 63 - F - no - 0

1 - 1 - 64 - F - yes - 1

2 - 0 - 60 - M - yes - 2

3 - 1 - 65 - M - no - 0

4 - 0 - 60 - F - no - 0

4 - 0 - 61 - F - yes - 1

4 - 0 - 62 - F - yes - 1

And I want to make a dataframe like this

**Patient_Id** /**Group** /**Age** /**Gender** /**surgery y/n** /**no of surgeries** 

1 - 1 - 63,33 - F - yes - 1 

2 - 0 - 60 - M - yes - 2

3 - 1 - 65 - M - no - 0

4 - 0 - 61 - F - yes - 2

Does anyone know what function would be best to use? Or how to start? Thank you in advance!

Data in dput format.

df1 <-
structure(list(Patient_Id = c(1, 1, 1, 2, 3, 4, 4, 4), 
Group = c(1, 1, 1, 0, 1, 0, 0, 0), Age = c(63, 63, 64, 
60, 65, 60, 61, 62), Gender = c("F", "F", "F", "M", 
"M", "F", "F", "F"), `surgery y/n` = c("no", "no", "yes", 
"yes", "no", "no", "yes", "yes"), `no of surgeries` = c(0L, 
0L, 1L, 2L, 0L, 0L, 1L, 1L)), row.names = c(NA, -8L), 
class = "data.frame")


df2 <-
structure(list(Patient_Id = c(1, 2, 3, 4), 
Group = c(1, 0, 1, 0), Age = c("63,33", 
"60", "65", "61"), Gender = c("F", "M", 
"M", "F"), `surgery y/n` = c("yes", "yes", 
"no", "yes"), `no of surgeries` = c(1, 2, 
0, 2)), row.names = c(NA, -4L), 
class = "data.frame")

The structure of my dataframe is as followed:

str( SMARTdata_50j_diagc_2016 ) 'data.frame': 458794 obs. of 20 variables:

$ Groep : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 2 2 2 ...

$ Ziekenhuis_Nr : Factor w/ 13 levels "1","10","11",..: 2 8 4 11 3 7 10 9 13 6 ...

$ Ziekenhuistype : Factor w/ 3 levels "0","1","2": 2 2 2 2 1 1 2 1 2 3 ...

$ Patient_Id : num 85550 101414 239946 291650 140558 ...

$ DBC_Id : num 181394 230887 448945 524873 251352 ...

$ Diagnose_Code : Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...

$ Zorgtype_Code : Factor w/ 2 levels "0","1": 2 2 2 1 2 2 2 1 1 2 ...

$ Lft_patient_openenDBC : num 50 80 66 60 67 64 54 71 70 76 ...

$ Geslacht : Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 1 ...

$ MRI_nee_ja : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...

$ MRI_Aantal : num 0 0 0 1 0 0 0 0 0 0 ...

$ Artroscopie_nee_jaz_jam : Factor w/ 3 levels "0","1","2": 1 1 1 3 1 1 1 1 1 1 ...

$ Artroscopie_aantal : num 0 0 0 1 0 0 0 0 0 0 ...

$ Jaar_openen_DBC : num 2016 2017 2018 2017 2017 ...

$ Mnd_openen_DBC : num 12 5 6 2 5 8 10 11 1 1 ...

$ Jaar_sluiten_DBC : num 2017 2017 2018 2017 2017 ...

$ Mnd_sluiten_DBC : num 4 9 10 4 9 12 2 3 4 5 ...

$ Aantal_overigeDBC_bijopenen: num 1 1 2 1 0 0 1 0 0 0 ...

$ open_DBC : 'yearmon' num Dec 2016 May 2017 Jun 2018 Feb 2017 ...

$ sluiten_DBC : 'yearmon' num Apr 2017 Sep 2017 Oct 2018 Apr 2017 ...

Your question is straight forward. One way to do it via dplyr package would be,

library(dplyr)

df1 %>% 
 group_by(Patient_Id) %>% 
 summarise(Group = first(Group), 
           Age = mean(Age), 
           Gender = first(Gender), 
           `no of surgeries` = sum(`no of surgeries`), 
          `surgery y/n` = ifelse(`no of surgeries` == 0, 'no', 'yes'))

which gives,

 # A tibble: 4 x 6 Patient_Id Group Age Gender `no of surgeries` `surgery y/n` <dbl> <dbl> <dbl> <chr> <int> <chr> 1 1 1 63.3 F 1 yes 2 2 0 60 M 2 yes 3 3 1 65 M 0 no 4 4 0 61 F 2 yes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM