简体   繁体   中英

Transpose / reshape dataframe from long to wide format

I have a data frame that follows the below long Pattern:

studentInfo <- data.frame(University=c("A","B","C","B","A","D"),StudentID = c("S1","S1","S2","S2","S3","S3"),Subject = c("Maths", "Science", "English", "Maths", "History", "English"))

studentInfo<-data.table(studentInfo,keep.rownames = "FALSE")



    University   StudentID     Subject
1   A            S1            Maths
2   B            S1            Science
3   C            S2            English
4   B            S2            Maths
5   A            S3            History
6   D            S3            English

dcast (studentInfo,StudentID ~ Subject, value.var = "Subject")

I get the below:

 StudentID English History Maths Science
1:        S1    <NA>    <NA> Maths Science
2:        S2 English    <NA> Maths    <NA>
3:        S3 English History  <NA>    <NA>


I would like to get below:

    University  StudentID   S1     S3     S1      S2      S2      S3

1   A           S1          Maths                   
5   A           S3                 History              
2   B           S1                       Science            
4   B           S2                                Maths     
3   C           S2                                        English       
6   D           S3                                                English

I am new to coding in R. I am preparing a dataset to run Heatmap/Oncoprint. I have attempted using dcast of reshape2 and spread functions. But was not able to get the format I needed for the next step of my workflow.

Thanks

You can create a column with row number and then get data in wide format.

library(dplyr)

studentInfo %>%
    mutate(row = row_number()) %>%
    group_by(StudentID) %>%
    mutate(StudentID = paste(StudentID, row_number(), sep = "_")) %>%
    tidyr::pivot_wider(names_from = StudentID, values_from = Subject) %>%
    select(-row)

# A tibble: 6 x 7
#  University S1_1  S1_2    S2_1    S2_2  S3_1    S3_2   
#  <chr>      <chr> <chr>   <chr>   <chr> <chr>   <chr>  
#1 A          Maths NA      NA      NA    NA      NA     
#2 B          NA    Science NA      NA    NA      NA     
#3 C          NA    NA      English NA    NA      NA     
#4 B          NA    NA      NA      Maths NA      NA     
#5 A          NA    NA      NA      NA    History NA     
#6 D          NA    NA      NA      NA    NA      English

It is not advisable to have dataframe with same column names.

Try this:

dcast(studentInfo, University + StudentID ~ StudentID, value.var = 'Subject')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM