I'm looking at parts of a book. For a certain range of pages I have a metric and each book has a category. I have a data frame similar to:
file value pages category
a.pdf 17 A green
b.pdf 18 A red
a.pdf 22 B green
...
Each file will be the same category regardless of the time or value. Hence, a.pdf will always be green so some of this data is redundant. What I would like is to reformat the data to something like:
file pages_A pages_B pages_C category
a.pdf 17 22 7 green
b.pdf 18 11 43 red
...
What is the most elegant way to do this. I've tried merging the subsets together and deleting columns:
out = merge(subset(long, pages=="A"), subset(long, pages=="B"), by=c("file","category"), all=T)
out = merge(out, subset(long, pages=="C", by=c("file","category", all=T)
but this seems long-winded, especially if I have more than three Pages to reorder (which will happen soon).
Thanks, Ed
If temp
is your data set
library(reshape2)
dcast(temp, file + category ~ pages)
## file category A B C
## 1 a.pdf green 17 22 7
## 2 b.pdf red 18 11 43
Using data.table
it could be faster maybe (didn't benchmark though)
library(data.table)
dcast.data.table(setDT(temp), file + category ~ pages)
## file category A B C
## 1: a.pdf green 17 22 7
## 2: b.pdf red 18 11 43
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.