使用 R 中另一列的分组值创建一个新的数据框

Question

I have a list of products and clients who bought those products in the form of a data frame我有一个以数据框的形式购买这些产品的产品和客户的列表

client product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes

I would need to reorder the products in tuplas and add a third column with the number of clients who bought the two products.我需要对 tuplas 中的产品重新排序，并添加第三列，其中包含购买这两种产品的客户数量。 The solution would be two different tables, one with unique clients and another one with total bought tuples.解决方案是两个不同的表，一个有唯一的客户，另一个有总购买的元组。 So the previous example, the outcome would be:所以前面的例子，结果是：

product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1

product1 product2 count
pants shirt 3
pants shoes 1
shirt shoes 1

I would like to avoid duplicated information.我想避免重复的信息。 For exmple a row 'shirt pants 2' would not be needed.例如，不需要一行“衬衫裤 2”。

Would someone know how to do this?有人会知道如何做到这一点吗？

Thanks!谢谢！

Answer 1

This is probably not the most efficient way to do it, nor the most elegant, but it does what you need.这可能不是最有效的方法，也不是最优雅的方法，但它可以满足您的需求。 Given that your initial column names are 'client' and 'product',鉴于您的初始列名称是“客户”和“产品”，

library(stringr)
Count.Sales <- function(df){
df3 <- as.data.frame(t(combn(paste0(df$client, df$product), 2)))
df4 <- as.data.frame(table(df3[str_extract(df3$V1, '[[:digit:]]+') == str_extract(df3$V2, '[[:digit:]]+'),]))
df4 <- subset(df4, df4$Freq > 0)
df4$customer <- str_extract(df4$V1, '[[:digit:]]+')
df4[, !(colnames(df4) %in% c("Freq","customer"))] <- apply(df4[, !(colnames(df4) %in% c("Freq","customer"))], 2, function(i) sub('[[:digit:]]+', '', i))
new.df<- within(df4, rm(Freq))
new.df[] <- lapply(new.df, as.character)
r1 <- apply(new.df[,-3], 1, function(i)any(i[-1] != i[1]))
new.df <- new.df[r1,]
new.df$pairs <- do.call(paste, c(new.df[,-3], ' '))
new.df$pairs <- vapply(new.df$pairs, function(i) paste(sort(strsplit(i, ' ')[[1]]), collapse=' '), ' ')
t4 <- data.frame(with(new.df, table(pairs, customer)))
t4  <- t4[t4$Freq != 0,]
per_customer <- as.data.frame(table(t4$pairs))
total <- as.data.frame(table(new.df$pairs))
ls1 <- list(per_customer, total)
names(ls1) <- c('Unique.Customer', 'Total')
return(ls1)
}
Count.Sales(df)
#$Unique.Customer
#          Var1 Freq
#1  pants shirt    2
#2  pants shoes    1
#3  shirt shoes    1
#
#$Total
#          Var1 Freq
#1  pants shirt    3
#2  pants shoes    1
#3  shirt shoes    1

使用 R 中另一列的分组值创建一个新的数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-27 13:54:21

使用 R 中另一列的分组值创建一个新的数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-27 13:54:21

解决方案1
1 已采纳 2016-03-27 13:54:21