在R中，將一個列值與所有其他列進行比較

Question

我對R很新，我有一個問題，對於這里的專家來說可能非常簡單。

假設我有一個表“sales”，其中包括4個客戶ID（123-126）和4個產品（A，B，C，D）。

ID  A   B   C   D
123 0   1   1   0
124 1   1   0   0
125 1   1   0   1
126 0   0   0   1

我想計算產品之間的重疊。 因此對於A，具有A和B的ID的數量將是2.類似地，A和C之間的重疊將是0，並且A和D之間的重疊將是1.這是我的A和B重疊的代碼：

overlap <- sales [which(sales [,"A"] == 1 & sales [,"B"] == 1 ),]
countAB <- count(overlap,"ID")

我想對所有4個產品重復這個計算，所以A與B，C，D和B重疊，與A，C，D等重疊......我如何更改代碼來實現這一目標？

我希望最終輸出是每個雙產品組合的ID數。 這是產品親和力練習，我想找出一種產品，哪種產品最暢銷。 例如，對於A，使用它的銷售最多的產品將是B，然后是D，然后是C.需要將一些排序添加到代碼中以實現此目的。

謝謝你的幫助！

Answer 1

這是一個可能的解決方案：

sales <- 
read.csv(text=
"ID,A,B,C,D
123,0,1,1,0
124,1,1,0,0
125,1,1,0,1
126,0,0,0,1")

# get product names
prods <- colnames(sales)[-1]
# generate all products pairs (and transpose the matrix for convenience)
combs <- t(combn(prods,2))

# turn the combs into a data.frame with column P1,P2
res <- as.data.frame(combs)
colnames(res) <- c('P1','P2')  

# for each combination row :
# - subset sales selecting only the products in the row
# - count the number of rows summing to 2 (if sum=2 the 2 products have been sold together)
#   N.B.: length(which(logical_condition)) can be implemented with sum(logical_condition) 
#         since TRUE and FALSE are automatically coerced to 1 and 0
# finally add the resulting vector to the newly created data.frame
res$count <- apply(combs,1,function(comb){sum(rowSums(sales[,comb])==2)})

> res
  P1 P2 count
1  A  B     2
2  A  C     0
3  A  D     1
4  B  C     1
5  B  D     1
6  C  D     0

Answer 2

    #x1 is your dataframe
x1<-structure(list(ID = 123:126, A = c(0L, 1L, 1L, 0L), B = c(1L, 
1L, 1L, 0L), C = c(1L, 0L, 0L, 0L), D = c(0L, 0L, 1L, 1L)), .Names = c("ID", 
"A", "B", "C", "D"), class = "data.frame", row.names = c(NA, 
-4L))
#get the combination of all colnames but the first ("ID")
    k1<-combn(colnames(x1[,-1]),2)
#create two lists a1 and a2 so that we can iterate over each element 
    a1<-as.list(k1[seq(1,length(k1),2)])
    a2<-as.list(k1[seq(2,length(k1),2)])
# your own functions with varying i and j
     mapply(function(i,j) length(x1[which(x1[,i] == 1 & x1 [,j] == 1 ),1]),a1,a2)
    [1] 2 0 1 1 1 0

Answer 3

您可以使用矩陣乘法：

m <- as.matrix(d[-1])
z <- melt(crossprod(m,m))
z[as.integer(z$X1) < as.integer(z$X2),]
#    X1 X2 value
# 5   A  B     2
# 9   A  C     0
# 10  B  C     1
# 13  A  D     1
# 14  B  D     1
# 15  C  D     0

其中d是您的數據框：

d <- structure(list(ID = 123:126, A = c(0L, 1L, 1L, 0L), B = c(1L, 1L, 1L, 0L), C = c(1L, 0L, 0L, 0L), D = c(0L, 0L, 1L, 1L)), .Names = c("ID", "A", "B", "C", "D"), class = "data.frame", row.names = c(NA, -4L))

[更新]

要計算產品親和力，您可以：

z2 <- subset(z,X1!=X2)
do.call(rbind,lapply(split(z2,z2$X1),function(d) d[which.max(d$value),]))
#   X1 X2 value
# A  A  B     2
# B  B  A     2
# C  C  B     1
# D  D  A     1

Answer 4

你可能想看一下arules包。 它完全符合您的要求。 提供用於表示，處理和分析交易數據和模式（頻繁項目集和關聯規則）的基礎結構。 還提供了C. Borgelt的關聯挖掘算法Apriori和Eclat的C實現的接口。

在R中，將一個列值與所有其他列進行比較

問題描述

4 個解決方案

解決方案1
2 2015-02-20 22:28:37

解決方案2
2 已采納 2015-02-20 22:34:10

解決方案3
2 2015-02-20 22:42:20

解決方案4
1 2015-02-20 22:20:59

在R中，將一個列值與所有其他列進行比較

問題描述

4 個解決方案

解決方案1 2 2015-02-20 22:28:37

解決方案2 2 已采納 2015-02-20 22:34:10

解決方案3 2 2015-02-20 22:42:20

解決方案4 1 2015-02-20 22:20:59

解決方案1
2 2015-02-20 22:28:37

解決方案2
2 已采納 2015-02-20 22:34:10

解決方案3
2 2015-02-20 22:42:20

解決方案4
1 2015-02-20 22:20:59