简体   繁体   中英

In R, compare one column value to all other columns

I'm very new to R and I have a question which might be very simple for experts here.

Let's say i have a table "sales", which includes 4 customer IDs (123-126) and 4 products (A,B,C,D).

ID  A   B   C   D
123 0   1   1   0
124 1   1   0   0
125 1   1   0   1
126 0   0   0   1

I want to calculate the overlaps between products. So for A, the number of IDs that have both A and B will be 2. Similarly, the overlap between A and C will be 0 and that between A and D will be 1. Here is my code for A and B overlap:

overlap <- sales [which(sales [,"A"] == 1 & sales [,"B"] == 1 ),]
countAB <- count(overlap,"ID")

I want to repeat this calculation for all 4 products,so A overlaps with B,C,D and B overlaps with A,C,D, etc...How can i change the code to accomplish this?

I want the final output to be the number of IDs for each two-product combination. It's product affinity exercise and i want to find out for one product, which product sold the most with it. For example, for A, the most sold products with it will be B, followed by D, then C. Some sorting needs to be added to the code to get to this i think.

Thanks for your help!

Here's a possible solution :

sales <- 
read.csv(text=
"ID,A,B,C,D
123,0,1,1,0
124,1,1,0,0
125,1,1,0,1
126,0,0,0,1")

# get product names
prods <- colnames(sales)[-1]
# generate all products pairs (and transpose the matrix for convenience)
combs <- t(combn(prods,2))

# turn the combs into a data.frame with column P1,P2
res <- as.data.frame(combs)
colnames(res) <- c('P1','P2')  

# for each combination row :
# - subset sales selecting only the products in the row
# - count the number of rows summing to 2 (if sum=2 the 2 products have been sold together)
#   N.B.: length(which(logical_condition)) can be implemented with sum(logical_condition) 
#         since TRUE and FALSE are automatically coerced to 1 and 0
# finally add the resulting vector to the newly created data.frame
res$count <- apply(combs,1,function(comb){sum(rowSums(sales[,comb])==2)})

> res
  P1 P2 count
1  A  B     2
2  A  C     0
3  A  D     1
4  B  C     1
5  B  D     1
6  C  D     0
    #x1 is your dataframe
x1<-structure(list(ID = 123:126, A = c(0L, 1L, 1L, 0L), B = c(1L, 
1L, 1L, 0L), C = c(1L, 0L, 0L, 0L), D = c(0L, 0L, 1L, 1L)), .Names = c("ID", 
"A", "B", "C", "D"), class = "data.frame", row.names = c(NA, 
-4L))
#get the combination of all colnames but the first ("ID")
    k1<-combn(colnames(x1[,-1]),2)
#create two lists a1 and a2 so that we can iterate over each element 
    a1<-as.list(k1[seq(1,length(k1),2)])
    a2<-as.list(k1[seq(2,length(k1),2)])
# your own functions with varying i and j
     mapply(function(i,j) length(x1[which(x1[,i] == 1 & x1 [,j] == 1 ),1]),a1,a2)
    [1] 2 0 1 1 1 0

You can use matrix multiplication:

m <- as.matrix(d[-1])
z <- melt(crossprod(m,m))
z[as.integer(z$X1) < as.integer(z$X2),]
#    X1 X2 value
# 5   A  B     2
# 9   A  C     0
# 10  B  C     1
# 13  A  D     1
# 14  B  D     1
# 15  C  D     0

where d is your data frame:

d <- structure(list(ID = 123:126, A = c(0L, 1L, 1L, 0L), B = c(1L, 1L, 1L, 0L), C = c(1L, 0L, 0L, 0L), D = c(0L, 0L, 1L, 1L)), .Names = c("ID", "A", "B", "C", "D"), class = "data.frame", row.names = c(NA, -4L))

[Update]

To calculate the product affinity, you can do:

z2 <- subset(z,X1!=X2)
do.call(rbind,lapply(split(z2,z2$X1),function(d) d[which.max(d$value),]))
#   X1 X2 value
# A  A  B     2
# B  B  A     2
# C  C  B     1
# D  D  A     1

You might want to take a look at the arules package. It does exactly what you are looking for. Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM