Simple reshaping, I have the following data:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
I want a column for each unique value of "ingredient", for example:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
Seems trivial enough, I tried reshape but I keep getting counts (not the actual values of "ingredients"). Ideas?
Here's a possible solution using data.table
package
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
Alternavely, we could do this with the sexy dplyr/tidyr
combination
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
In the spirit of sharing alternatives, here are two more:
Option 1 : split
the columns and use stri_list2matrix
to create your wide form.
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
Option 2 : Use getanID
from my "splitstackshape" package to generate an ".id" column, then dcast
it. The "data.table" package is loaded with "splitstackshape", so you can directly call dcast.data.table
to do the reshaping.
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
With base R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.