R：比较邻居字符串之间的文本相似性

Question

I am trying to compare texts in a column to identify the text similarity, in terms of whether adjacent letters in the texts are similar; 我试图比较一列中的文本，以确定文本相似性，就文本中相邻字母是否相似而言; how many substition is necessary for two adjacent letters to make the both letters same. 两个相邻字母需要多少个子字以使两个字母相同。

Example: JANE-JNAE (1 - AN/NA), MARY-MART(0), CLERA-LCREA(2 - CL/LC & ER/RE) 示例：JANE-JNAE（1 - AN / NA），MARY-MART（0），CLERA-LCREA（2 - CL / LC＆ER / RE）

I have tried stringdist methods but they do not provide solutions for my problem. 我尝试过stringdist方法，但它们没有为我的问题提供解决方案。

Since I am new to R, I could not write an efficent code to show here: 由于我是R的新手，我不能写一个高效的代码来显示在这里：

substition <- function(text1,tex2){

  if(text1 == text2){
    return(TRUE)
  }

  if(nchar(text1) != nchar(text2)){
    return(FALSE)
  }

  vec1 <- strsplit("text1",split="")[[1]]
  vec2 <- strsplit("text2",split="")[[1]]

(can't go on)

. 。 But to illustrate: 但要说明：

data is something like this 数据是这样的

df$NO  df$names
1      JANE
2      MARY
3      CLERA
4      JNAE
5      LCREA
6      MART

and the desired output is: 并且所需的输出是：

df$NO  df$names df$substition
1      JANE     1
2      MARY     0
3      CLERA    2
4      JNAE     1
5      LCREA    2
6      MART     0

Answer 1

You can use the Levenshtein distance ( https://en.wikipedia.org/wiki/Levenshtein_distance ) between strings. 您可以在字符串之间使用Levenshtein距离（ https://en.wikipedia.org/wiki/Levenshtein_distance ）。 The distance gives the minimal number of insertions, deletions and substitutions needed to transform one string into another. 距离给出了将一个字符串转换为另一个字符串所需的最小插入，删除和替换次数。

Usage 用法

adist(
  c("lazy", "lasso", "lassie"),
  c("lazy", "lazier", "laser")
)

Returns a 3x3 matrix of distances: 返回3x3距离矩阵：

##      [,1] [,2] [,3]
## [1,]    0    3    3
## [2,]    3    4    2
## [3,]    4    3    3

R：比较邻居字符串之间的文本相似性

问题描述

1 个解决方案

解决方案1
0 2019-05-10 08:46:56

R：比较邻居字符串之间的文本相似性

问题描述

1 个解决方案

解决方案1 0 2019-05-10 08:46:56

解决方案1
0 2019-05-10 08:46:56