[英]reshaping data frame r
簡單的重塑,我有以下數據:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
我想為“成分”的每個唯一值創建一個列,例如:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
似乎微不足道,我嘗試重塑形狀,但是我一直在計數(而不是“成分”的實際值)。 想法?
這是使用data.table
包的可能解決方案
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
另外,我們可以通過性感的dplyr/tidyr
組合來做到這一點
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
本着共享替代方案的精神,這里還有兩個:
選項1 : split
列,然后使用stri_list2matrix
創建寬表單。
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
選項2:使用getanID
從我的“splitstackshape”包生成“.ID”一欄,然后dcast
它。 “ data.table”包已加載“ splitstackshape”,因此您可以直接調用dcast.data.table
進行重塑。
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
底座R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.