Jaccard Similarity between strings using a for loop in R

Question

I am trying to compute the jaccard similarity between each pair of names in large vectors of names (see below for small example) and to store their jaccard similarity in a matrix. My function is just returning NULL. What am I doing wrong?

library(dplyr)

df = data.frame(matrix(NA, ncol=3, nrow=3))
df = df %>%
    mutate_if(is.logical, as.numeric)

names(df) = c("A.J. Doyle", "A.J. Graham", "A.J. Porter")
draft_names = names(df) 
row.names(df) = c("A.J. Feeley", "A.J. McCarron", "Aaron Brooks")
quarterback_names = row.names(df)

library(stringdist)

jaccard_similarity = function(d){
  for (i in 1:nrow(d)){
    for(j in 1:ncol(d)){
      d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
    }
  }
}

df = jaccard_similarity(df)

Answer 1

You are not returning anything after the for loops. Use return(d) at the end of the function.

This problem is also a classic use case for outer :

outer(quarterback_names,draft_names,FUN=stringdist,method="jaccard",q=2)
          [,1]      [,2]      [,3]
[1,] 0.6428571 0.7500000 0.7500000
[2,] 0.7647059 0.7777778 0.7777778
[3,] 1.0000000 1.0000000 1.0000000

Answer 2

You need to return your changed dataframe:

jaccard_similarity = function(d){
  for (i in 1:nrow(d)){
    for(j in 1:ncol(d)){
      d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
    }
  }
  return(d)
  // ^^^
}

Afterwards jaccard_similarity(df) yields

  AJ Doyle AJ Graham AJ Porter AJ Feeley 0.6428571 0.7500000 0.7500000 AJ McCarron 0.7647059 0.7777778 0.7777778 Aaron Brooks 1.0000000 1.0000000 1.0000000

Answer 3

Reason : There is no explict return.

Reference

you can add print and debug like below and trace

jaccard_similarity = function(d){
  for (i in 1:nrow(d)){
    for(j in 1:ncol(d)){
      d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
      print(d[i,j])
    }
  }
  return(d)
}

Output:

[1] 0.6428571
[1] 0.75
[1] 0.75
[1] 0.7647059
[1] 0.7777778
[1] 0.7777778
[1] 1
[1] 1
[1] 1

You can simply call jaccard_similarity(df) too get the values.

output  <-jaccard_similarity(df)

              A.J. Doyle A.J. Graham A.J. Porter
A.J. Feeley    0.6428571   0.7500000   0.7500000
A.J. McCarron  0.7647059   0.7777778   0.7777778
Aaron Brooks   1.0000000   1.0000000   1.0000000

And assign the output to new variable rather overriding existing df .

Jaccard Similarity between strings using a for loop in R

Question

3 answers

solution1
3 2018-03-26 19:43:57

solution2
2 ACCPTED 2018-03-26 19:41:19

solution3
0 2018-03-26 19:50:21

Jaccard Similarity between strings using a for loop in R

Question

3 answers

solution1 3 2018-03-26 19:43:57

solution2 2 ACCPTED 2018-03-26 19:41:19

solution3 0 2018-03-26 19:50:21

solution1
3 2018-03-26 19:43:57

solution2
2 ACCPTED 2018-03-26 19:41:19

solution3
0 2018-03-26 19:50:21