简体   繁体   中英

R subsetting with dplyr

Troubles with R subsetting and arranging datasets. I have a dataset that looks like this:

Student   Skill    Correct
64525     10       1
64525     10       1
70363     10       0
70363     10       1
70363     10       1
64525     15       0
70363     15       0
70363     15       1

I would need to create a new dataset for each skill, with a row for each student and a column for each observation (Correct). Like this:

Skill: 10

Student   Obs1 Obs2 Obs3 
64525     1    1    NA        
70363     0    1    1



Skill: 15

Student   Obs1 Obs2 
64525     0    NA           
70363     0    1    

Notice that the number of columns of each skill dataset can vary, depending on the numebr of observations for each student. Notice also that the value can be a NA if there is not such an observation in the dataset (a student can try the skill a different number of times than other students).

I think this might ea job for the dplyr package but I am not sure.

I really appreciate the help of the community!!

Here's a possible data.table implementation

library(data.table) # V 1.10.0
res <- setDT(df)[, .(.(dcast(.SD, Student ~ rowid(Student)))), by = Skill]

Which will result in a data.table of data.table s

res
#    Skill           V1
# 1:    10 <data.table>
# 2:    15 <data.table>

Which could be segmented by the Skill column

res[Skill == 10, V1]
# [[1]]
#    Student 1 2  3
# 1:   64525 1 1 NA
# 2:   70363 0 1  1

Or in order to see the whole column

res[, V1]
# [[1]]
#    Student 1 2  3
# 1:   64525 1 1 NA
# 2:   70363 0 1  1
# 
# [[2]]
#    Student 1  2
# 1:   64525 0 NA
# 2:   70363 0  1

This will get the job done.

xy <- read.table(text = "Student   Skill    Correct
64525     10       1
64525     10       1
70363     10       0
70363     10       1
70363     10       1
64525     15       0
70363     15       0
70363     15       1", header = TRUE)


# first split by skill and work on each element
sapply(split(xy, xy$Skill), FUN = function(x) {

  # extract column correct
  out <- sapply(split(x, x$Student), FUN = "[[", "Correct")

  # pad shortest vectors with NAs at the end
  out <- mapply(out, max(lengths(out)), FUN = function(m, a) {
    c(m, rep(NA, times = (a - length(m))))
  }, SIMPLIFY = FALSE)

  do.call(rbind, out)
})

$`10`
      [,1] [,2] [,3]
64525    1    1   NA
70363    0    1    1

$`15`
      [,1] [,2]
64525    0   NA
70363    0    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM