I have a data frame that follows the below long Pattern:
studentInfo <- data.frame(University=c("A","B","C","B","A","D"),StudentID = c("S1","S1","S2","S2","S3","S3"),Subject = c("Maths", "Science", "English", "Maths", "History", "English"))
studentInfo<-data.table(studentInfo,keep.rownames = "FALSE")
University StudentID Subject
1 A S1 Maths
2 B S1 Science
3 C S2 English
4 B S2 Maths
5 A S3 History
6 D S3 English
dcast (studentInfo,StudentID ~ Subject, value.var = "Subject")
I get the below:
StudentID English History Maths Science
1: S1 <NA> <NA> Maths Science
2: S2 English <NA> Maths <NA>
3: S3 English History <NA> <NA>
I would like to get below:
University StudentID S1 S3 S1 S2 S2 S3
1 A S1 Maths
5 A S3 History
2 B S1 Science
4 B S2 Maths
3 C S2 English
6 D S3 English
I am new to coding in R. I am preparing a dataset to run Heatmap/Oncoprint. I have attempted using dcast of reshape2 and spread functions. But was not able to get the format I needed for the next step of my workflow.
Thanks
You can create a column with row number and then get data in wide format.
library(dplyr)
studentInfo %>%
mutate(row = row_number()) %>%
group_by(StudentID) %>%
mutate(StudentID = paste(StudentID, row_number(), sep = "_")) %>%
tidyr::pivot_wider(names_from = StudentID, values_from = Subject) %>%
select(-row)
# A tibble: 6 x 7
# University S1_1 S1_2 S2_1 S2_2 S3_1 S3_2
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 A Maths NA NA NA NA NA
#2 B NA Science NA NA NA NA
#3 C NA NA English NA NA NA
#4 B NA NA NA Maths NA NA
#5 A NA NA NA NA History NA
#6 D NA NA NA NA NA English
It is not advisable to have dataframe with same column names.
Try this:
dcast(studentInfo, University + StudentID ~ StudentID, value.var = 'Subject')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.