使用 R 中另一列的分組值創建一個新的數據框

Question

我有一個以數據框的形式購買這些產品的產品和客戶的列表

client product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes

我需要對 tuplas 中的產品重新排序，並添加第三列，其中包含購買這兩種產品的客戶數量。 解決方案是兩個不同的表，一個有唯一的客戶，另一個有總購買的元組。 所以前面的例子，結果是：

product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1

product1 product2 count
pants shirt 3
pants shoes 1
shirt shoes 1

我想避免重復的信息。 例如，不需要一行“襯衫褲 2”。

有人會知道如何做到這一點嗎？

謝謝！

Answer 1

這可能不是最有效的方法，也不是最優雅的方法，但它可以滿足您的需求。 鑒於您的初始列名稱是“客戶”和“產品”，

library(stringr)
Count.Sales <- function(df){
df3 <- as.data.frame(t(combn(paste0(df$client, df$product), 2)))
df4 <- as.data.frame(table(df3[str_extract(df3$V1, '[[:digit:]]+') == str_extract(df3$V2, '[[:digit:]]+'),]))
df4 <- subset(df4, df4$Freq > 0)
df4$customer <- str_extract(df4$V1, '[[:digit:]]+')
df4[, !(colnames(df4) %in% c("Freq","customer"))] <- apply(df4[, !(colnames(df4) %in% c("Freq","customer"))], 2, function(i) sub('[[:digit:]]+', '', i))
new.df<- within(df4, rm(Freq))
new.df[] <- lapply(new.df, as.character)
r1 <- apply(new.df[,-3], 1, function(i)any(i[-1] != i[1]))
new.df <- new.df[r1,]
new.df$pairs <- do.call(paste, c(new.df[,-3], ' '))
new.df$pairs <- vapply(new.df$pairs, function(i) paste(sort(strsplit(i, ' ')[[1]]), collapse=' '), ' ')
t4 <- data.frame(with(new.df, table(pairs, customer)))
t4  <- t4[t4$Freq != 0,]
per_customer <- as.data.frame(table(t4$pairs))
total <- as.data.frame(table(new.df$pairs))
ls1 <- list(per_customer, total)
names(ls1) <- c('Unique.Customer', 'Total')
return(ls1)
}
Count.Sales(df)
#$Unique.Customer
#          Var1 Freq
#1  pants shirt    2
#2  pants shoes    1
#3  shirt shoes    1
#
#$Total
#          Var1 Freq
#1  pants shirt    3
#2  pants shoes    1
#3  shirt shoes    1

使用 R 中另一列的分組值創建一個新的數據框

問題描述

1 個解決方案

解決方案1
1 已采納 2016-03-27 13:54:21

使用 R 中另一列的分組值創建一個新的數據框

問題描述

1 個解決方案

解決方案1 1 已采納 2016-03-27 13:54:21

解決方案1
1 已采納 2016-03-27 13:54:21