[英]reshaping data frame r
简单的重塑,我有以下数据:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
我想为“成分”的每个唯一值创建一个列,例如:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
似乎微不足道,我尝试重塑形状,但是我一直在计数(而不是“成分”的实际值)。 想法?
这是使用data.table
包的可能解决方案
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
另外,我们可以通过性感的dplyr/tidyr
组合来做到这一点
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
本着共享替代方案的精神,这里还有两个:
选项1 : split
列,然后使用stri_list2matrix
创建宽表单。
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
选项2:使用getanID
从我的“splitstackshape”包生成“.ID”一栏,然后dcast
它。 “ data.table”包已加载“ splitstackshape”,因此您可以直接调用dcast.data.table
进行重塑。
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
底座R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.