重塑數據幀

Question

簡單的重塑，我有以下數據：

df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1   A   Chocolate 
2   A     Vanilla
3   A       Berry
4   B   Chocolate
5   B      Berry2
6   C     Vanilla

我想為“成分”的每個唯一值創建一個列，例如：

df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A       Chocolate       Vanilla        Berry
B       Chocolate       Berry2         NULL
C       Vanilla         NULL           NULL

似乎微不足道，我嘗試重塑形狀，但是我一直在計數（而不是“成分”的實際值）。 想法？

Answer 1

這是使用data.table包的可能解決方案

library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
#    Product Ingredient_1 Ingredient_2 Ingredient_3
# 1:       A    Chocolate      Vanilla        Berry
# 2:       B    Chocolate       Berry2           NA
# 3:       C      Vanilla           NA           NA

另外，我們可以通過性感的dplyr/tidyr組合來做到這一點

library(dplyr)
library(tidyr)
df %>% 
  group_by(Product) %>%
  mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
  spread(Ingredient, Ingredients)

# Source: local data frame [3 x 4]
# 
#   Product Ingredient_1 Ingredient_2 Ingredient_3
# 1       A    Chocolate      Vanilla        Berry
# 2       B    Chocolate       Berry2           NA
# 3       C      Vanilla           NA           NA

Answer 2

本着共享替代方案的精神，這里還有兩個：

選項1 ： split列，然后使用stri_list2matrix創建寬表單。

library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
#   Product        X1        X2      X3
# 1       A Chocolate Chocolate Vanilla
# 2       B   Vanilla    Berry2    <NA>
# 3       C     Berry      <NA>    <NA>

選項2：使用getanID從我的“splitstackshape”包生成“.ID”一欄，然后dcast它。 “ data.table”包已加載“ splitstackshape”，因此您可以直接調用dcast.data.table進行重塑。

library(splitstackshape)
dcast.data.table(getanID(df, "Product"), 
                 Product ~ .id, value.var = "Ingredients")
#    Product         1       2     3
# 1:       A Chocolate Vanilla Berry
# 2:       B Chocolate  Berry2    NA
# 3:       C   Vanilla      NA    NA

Answer 3

底座R reshape

df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")

#  Product Ingredients_1 Ingredients_2 Ingredients_3
#1       A     Chocolate       Vanilla         Berry
#4       B     Chocolate        Berry2          <NA>
#6       C       Vanilla          <NA>          <NA>

重塑數據幀

問題描述

3 個解決方案

解決方案1
2 已采納 2015-02-03 20:56:47

解決方案2
2 2015-02-04 04:15:58

解決方案3
1 2015-02-03 21:41:58

重塑數據幀

問題描述

3 個解決方案

解決方案1 2 已采納 2015-02-03 20:56:47

解決方案2 2 2015-02-04 04:15:58

解決方案3 1 2015-02-03 21:41:58

解決方案1
2 已采納 2015-02-03 20:56:47

解決方案2
2 2015-02-04 04:15:58

解決方案3
1 2015-02-03 21:41:58