两个字符串的相似度算法实现——编辑距离之Levenshtein距离

it2022-05-05  109

Levenshtein距离(莱文斯距离),是编辑距离的一种。指两个字符串之间一个转成所需的最少编辑操作次数。允许的编辑操作包括一个字符替换成另一个字符,插入一个字符,删除一个字符。适用场景包括一个字符串与多个字符串比较求最相近的字符串等等

例如将kitten转成sitting

1、sitten  (k->s)

2、sittin (e->i)

3、sitting (->g)


Java代码实现:

Java代码详细下载地址:https://download.csdn.net/download/qq_16220645/11382095

/** * 两个字符串相似度算法 * 编辑距离相似度算法 */ public class test { public static int compare(String str,String target){ //声明二维数组 int d[][]; int i; int j; int strLength = str.length(); int targetLength = target.length(); d = new int[strLength+1][targetLength+1]; if(strLength == 0){ return targetLength; } if(targetLength == 0){ return strLength; } //初始化二维数组 for(i=0;i <= strLength;i++){ d[i][0] = i; } for(j=0;j<= targetLength;j++){ d[0][j] = j; } char c1; char c2; int temp; for(i=1 ;i<= strLength;i++){ c1 = str.charAt(i-1); for(j=1;j<= targetLength;j++){ c2 = target.charAt(j-1); if(c1 == c2 || c1 + 32 == c2 || c2 + 32 == c1){ temp = 0; }else{ temp = 1; } d[i][j] = min(d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+temp); } } return d[strLength][targetLength]; } public static int min(int one,int two,int three){ return (one = one < two ? one :two) < three ? one : three; } public static float getSimilarityRatio(String str,String target){ int max = Math.max(str.length(),target.length()); return 1 - (float)compare(str,target) / max; } public static void main(String[] args){ String a = "kitten"; String b = "sitting"; System.out.println(getSimilarityRatio(a,b)); } }

Python代码实现:

Python代码详细下载地址:https://download.csdn.net/download/qq_16220645/11382107

def compare(str,target):

n = len(str) + 1 m = len(target) + 1 if( n== 0): return m if(m==0): return n #构建二维矩阵 distance_matrix =[[0]*m for x in range(n)] #初始化矩阵 for i in range(n): distance_matrix[i][0] = i for j in range(m): distance_matrix[0][j] = j print("distance_matrix>>>>{0}".format(distance_matrix)) for i in range(1,n): for j in range(1,m): deletion = distance_matrix[i-1][j] + 1 insertion = distance_matrix[i][j-1] + 1 substitution = distance_matrix[i-1][j-1] if str[i-1] != target[j-1]: substitution += 1 distance_matrix[i][j] = min(insertion,deletion,substitution) print("distance_matrix>>>>>>>{0}".format(distance_matrix[i][j])) return distance_matrix[n-1][m-1] def getCompareRatio(a,b): resultValue = compare(a,b) maxValue = max(len(a),len(b)) return 1- (float)(resultValue/maxValue) if __name__ == '__main__': a = 'kitten' b = 'sitting' print(getCompareRatio(a,b))

你的鼓励是我分享技术最大的动力!如有错误之处,请指正,不胜感激。

 


最新回复(0)