Rearranging a data frame?

Question

I'm looking at parts of a book. For a certain range of pages I have a metric and each book has a category. I have a data frame similar to:

file    value    pages   category
a.pdf   17       A       green
b.pdf   18       A       red
a.pdf   22       B       green
...

Each file will be the same category regardless of the time or value. Hence, a.pdf will always be green so some of this data is redundant. What I would like is to reformat the data to something like:

file    pages_A    pages_B    pages_C  category
a.pdf   17         22         7        green
b.pdf   18         11         43       red

...

What is the most elegant way to do this. I've tried merging the subsets together and deleting columns:

out = merge(subset(long, pages=="A"), subset(long, pages=="B"), by=c("file","category"), all=T)
out = merge(out, subset(long, pages=="C", by=c("file","category", all=T)

but this seems long-winded, especially if I have more than three Pages to reorder (which will happen soon).

Thanks, Ed

Answer 1

If temp is your data set

library(reshape2)
dcast(temp, file + category ~ pages)

##    file category  A  B  C
## 1 a.pdf    green 17 22  7
## 2 b.pdf      red 18 11 43

Using data.table it could be faster maybe (didn't benchmark though)

library(data.table)
dcast.data.table(setDT(temp), file + category ~ pages)

##     file category  A  B  C
## 1: a.pdf    green 17 22  7
## 2: b.pdf      red 18 11 43

Rearranging a data frame?

Question

1 answers

solution1
4 ACCPTED 2014-06-09 10:03:02

Rearranging a data frame?

Question

1 answers

solution1 4 ACCPTED 2014-06-09 10:03:02

solution1
4 ACCPTED 2014-06-09 10:03:02