[英]creating a new data frame with the counts by the grouped values of another column in R
I have a list of products and clients who bought those products in the form of a data frame我有一个以数据框的形式购买这些产品的产品和客户的列表
client product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes
I would need to reorder the products in tuplas and add a third column with the number of clients who bought the two products.我需要对 tuplas 中的产品重新排序,并添加第三列,其中包含购买这两种产品的客户数量。 The solution would be two different tables, one with unique clients and another one with total bought tuples.
解决方案是两个不同的表,一个有唯一的客户,另一个有总购买的元组。 So the previous example, the outcome would be:
所以前面的例子,结果是:
product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1
product1 product2 count
pants shirt 3
pants shoes 1
shirt shoes 1
I would like to avoid duplicated information.我想避免重复的信息。 For exmple a row 'shirt pants 2' would not be needed.
例如,不需要一行“衬衫裤 2”。
Would someone know how to do this?有人会知道如何做到这一点吗?
Thanks!谢谢!
This is probably not the most efficient way to do it, nor the most elegant, but it does what you need.这可能不是最有效的方法,也不是最优雅的方法,但它可以满足您的需求。 Given that your initial column names are 'client' and 'product',
鉴于您的初始列名称是“客户”和“产品”,
library(stringr)
Count.Sales <- function(df){
df3 <- as.data.frame(t(combn(paste0(df$client, df$product), 2)))
df4 <- as.data.frame(table(df3[str_extract(df3$V1, '[[:digit:]]+') == str_extract(df3$V2, '[[:digit:]]+'),]))
df4 <- subset(df4, df4$Freq > 0)
df4$customer <- str_extract(df4$V1, '[[:digit:]]+')
df4[, !(colnames(df4) %in% c("Freq","customer"))] <- apply(df4[, !(colnames(df4) %in% c("Freq","customer"))], 2, function(i) sub('[[:digit:]]+', '', i))
new.df<- within(df4, rm(Freq))
new.df[] <- lapply(new.df, as.character)
r1 <- apply(new.df[,-3], 1, function(i)any(i[-1] != i[1]))
new.df <- new.df[r1,]
new.df$pairs <- do.call(paste, c(new.df[,-3], ' '))
new.df$pairs <- vapply(new.df$pairs, function(i) paste(sort(strsplit(i, ' ')[[1]]), collapse=' '), ' ')
t4 <- data.frame(with(new.df, table(pairs, customer)))
t4 <- t4[t4$Freq != 0,]
per_customer <- as.data.frame(table(t4$pairs))
total <- as.data.frame(table(new.df$pairs))
ls1 <- list(per_customer, total)
names(ls1) <- c('Unique.Customer', 'Total')
return(ls1)
}
Count.Sales(df)
#$Unique.Customer
# Var1 Freq
#1 pants shirt 2
#2 pants shoes 1
#3 shirt shoes 1
#
#$Total
# Var1 Freq
#1 pants shirt 3
#2 pants shoes 1
#3 shirt shoes 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.