简体   繁体   English

使用 R 中另一列的分组值创建一个新的数据框

[英]creating a new data frame with the counts by the grouped values of another column in R

I have a list of products and clients who bought those products in the form of a data frame我有一个以数据框的形式购买这些产品的产品和客户的列表

client product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes

I would need to reorder the products in tuplas and add a third column with the number of clients who bought the two products.我需要对 tuplas 中的产品重新排序,并添加第三列,其中包含购买这两种产品的客户数量。 The solution would be two different tables, one with unique clients and another one with total bought tuples.解决方案是两个不同的表,一个有唯一的客户,另一个有总购买的元组。 So the previous example, the outcome would be:所以前面的例子,结果是:

product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1

product1 product2 count
pants shirt 3
pants shoes 1
shirt shoes 1

I would like to avoid duplicated information.我想避免重复的信息。 For exmple a row 'shirt pants 2' would not be needed.例如,不需要一行“衬衫裤 2”。

Would someone know how to do this?有人会知道如何做到这一点吗?

Thanks!谢谢!

This is probably not the most efficient way to do it, nor the most elegant, but it does what you need.这可能不是最有效的方法,也不是最优雅的方法,但它可以满足您的需求。 Given that your initial column names are 'client' and 'product',鉴于您的初始列名称是“客户”和“产品”,

library(stringr)
Count.Sales <- function(df){
df3 <- as.data.frame(t(combn(paste0(df$client, df$product), 2)))
df4 <- as.data.frame(table(df3[str_extract(df3$V1, '[[:digit:]]+') == str_extract(df3$V2, '[[:digit:]]+'),]))
df4 <- subset(df4, df4$Freq > 0)
df4$customer <- str_extract(df4$V1, '[[:digit:]]+')
df4[, !(colnames(df4) %in% c("Freq","customer"))] <- apply(df4[, !(colnames(df4) %in% c("Freq","customer"))], 2, function(i) sub('[[:digit:]]+', '', i))
new.df<- within(df4, rm(Freq))
new.df[] <- lapply(new.df, as.character)
r1 <- apply(new.df[,-3], 1, function(i)any(i[-1] != i[1]))
new.df <- new.df[r1,]
new.df$pairs <- do.call(paste, c(new.df[,-3], ' '))
new.df$pairs <- vapply(new.df$pairs, function(i) paste(sort(strsplit(i, ' ')[[1]]), collapse=' '), ' ')
t4 <- data.frame(with(new.df, table(pairs, customer)))
t4  <- t4[t4$Freq != 0,]
per_customer <- as.data.frame(table(t4$pairs))
total <- as.data.frame(table(new.df$pairs))
ls1 <- list(per_customer, total)
names(ls1) <- c('Unique.Customer', 'Total')
return(ls1)
}
Count.Sales(df)
#$Unique.Customer
#          Var1 Freq
#1  pants shirt    2
#2  pants shoes    1
#3  shirt shoes    1
#
#$Total
#          Var1 Freq
#1  pants shirt    3
#2  pants shoes    1
#3  shirt shoes    1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 r创建与另一列分组的行元素匹配的新数据框 - r create new data frame that matches in rows elements grouped by another column R:根据第二个data.frame中的值在d​​ata.frame中创建一个新列 - R: creating a new column in a data.frame based on values out of a second data.frame 使用一个data.frame中的数据为R中另一个data.frame中的新列生成值 - Using data in one data.frame to generate values for a new column in another data.frame in R 计算R数据框中的数值,按另一个字段分组 - Count numeric values in R data frame, grouped by another field R:如何用另一个数据框中的“ countif”值在数据框中创建新列? - R: How to create a new column in a data frame with “countif” values from another data frame? R:如何根据另一列分组的数据帧中的前几行为第90个分位数创建新列? - R: How to create a new column for 90th quantile based off previous rows in a data frame grouped by another column? 从数据框中导出唯一 ID 计数并为计数和值创建新列 - Deriving number of unique ID counts from a data frame and creating new columns for counts and values R识别数据框中的第一个值,并通过从新列的数据框中的所有值中添加/减去该值来创建新变量 - R identifying first value in data-frame and creating new variable by adding/subtracting this from all values in data-frame in new column 使用R,将多个卡方应变表测试应用于分组数据帧,并添加包含测试p值的新列 - Using R, apply multiple chi-square contingency table tests to a grouped data frame and add a new column containing the p values of the tests 根据 R 中特定列的数值范围创建新数据框 - Creating a new data frame based on the range of numeric values of a specific column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM