简体   繁体   中英

Rearranging a data frame?

I'm looking at parts of a book. For a certain range of pages I have a metric and each book has a category. I have a data frame similar to:

file    value    pages   category
a.pdf   17       A       green
b.pdf   18       A       red
a.pdf   22       B       green
...

Each file will be the same category regardless of the time or value. Hence, a.pdf will always be green so some of this data is redundant. What I would like is to reformat the data to something like:

file    pages_A    pages_B    pages_C  category
a.pdf   17         22         7        green
b.pdf   18         11         43       red

...

What is the most elegant way to do this. I've tried merging the subsets together and deleting columns:

out = merge(subset(long, pages=="A"), subset(long, pages=="B"), by=c("file","category"), all=T)
out = merge(out, subset(long, pages=="C", by=c("file","category", all=T)

but this seems long-winded, especially if I have more than three Pages to reorder (which will happen soon).

Thanks, Ed

If temp is your data set

library(reshape2)
dcast(temp, file + category ~ pages)

##    file category  A  B  C
## 1 a.pdf    green 17 22  7
## 2 b.pdf      red 18 11 43

Using data.table it could be faster maybe (didn't benchmark though)

library(data.table)
dcast.data.table(setDT(temp), file + category ~ pages)

##     file category  A  B  C
## 1: a.pdf    green 17 22  7
## 2: b.pdf      red 18 11 43

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM