繁体   English   中英

从长格式到宽格式转置/重塑 dataframe

[英]Transpose / reshape dataframe from long to wide format

我有一个遵循以下长模式的数据框:

studentInfo <- data.frame(University=c("A","B","C","B","A","D"),StudentID = c("S1","S1","S2","S2","S3","S3"),Subject = c("Maths", "Science", "English", "Maths", "History", "English"))

studentInfo<-data.table(studentInfo,keep.rownames = "FALSE")



    University   StudentID     Subject
1   A            S1            Maths
2   B            S1            Science
3   C            S2            English
4   B            S2            Maths
5   A            S3            History
6   D            S3            English

dcast (studentInfo,StudentID ~ Subject, value.var = "Subject")

我得到以下信息:

 StudentID English History Maths Science
1:        S1    <NA>    <NA> Maths Science
2:        S2 English    <NA> Maths    <NA>
3:        S3 English History  <NA>    <NA>


我想得到以下信息:

    University  StudentID   S1     S3     S1      S2      S2      S3

1   A           S1          Maths                   
5   A           S3                 History              
2   B           S1                       Science            
4   B           S2                                Maths     
3   C           S2                                        English       
6   D           S3                                                English

我是 R 编码的新手。 我正在准备一个数据集来运行 Heatmap/Oncoprint。 我曾尝试使用 reshape2 和传播函数的 dcast。 但无法获得工作流程下一步所需的格式。

谢谢

您可以创建具有行号的列,然后以宽格式获取数据。

library(dplyr)

studentInfo %>%
    mutate(row = row_number()) %>%
    group_by(StudentID) %>%
    mutate(StudentID = paste(StudentID, row_number(), sep = "_")) %>%
    tidyr::pivot_wider(names_from = StudentID, values_from = Subject) %>%
    select(-row)

# A tibble: 6 x 7
#  University S1_1  S1_2    S2_1    S2_2  S3_1    S3_2   
#  <chr>      <chr> <chr>   <chr>   <chr> <chr>   <chr>  
#1 A          Maths NA      NA      NA    NA      NA     
#2 B          NA    Science NA      NA    NA      NA     
#3 C          NA    NA      English NA    NA      NA     
#4 B          NA    NA      NA      Maths NA      NA     
#5 A          NA    NA      NA      NA    History NA     
#6 D          NA    NA      NA      NA    NA      English

不建议让 dataframe 具有相同的列名。

尝试这个:

dcast(studentInfo, University + StudentID ~ StudentID, value.var = 'Subject')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM