[英]Is there a soundex function for python?
Is there a soundex function for python and if not how would you go about making a soundex code?是否有用于 python 的 soundex 函数,如果没有,您将如何制作 soundex 代码?
Soundex
Code Letters
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R
SKIP A, E, H, I, O, U, W, Y, H, W, and Y
For example:例如:
Jackson = J250杰克逊 = J250
Washington = W252华盛顿 = W252
Clement = C455克莱门特 = C455
Ashcraft = A261 Ashcraft = A261
Wu = W000吴 = W000
Yes , you can use Fuzzy which is a python library implementing some phonetic algorithms.是的,您可以使用Fuzzy ,它是一个实现一些语音算法的 Python 库。
sudo pip install fuzzy
>>> import fuzzy
>>> soundex = fuzzy.Soundex(4)
>>> soundex("Jackson")
'J250'
>>> soundex("Washington")
'W252'
>>> soundex("Clement")
'C453'
>>> soundex("Ashcraft")
'A261'
>>> soundex("Wu")
'W000'
You can use jellyfish你可以用海蜇
sudo pip install jellyfish
print "Soundex\t\t=", jellyfish.soundex("Ala ma kaca")
>Soundex = A452
#...
>Metaphone = AL M KK
>NYSIIS = AL
>Match rating codex = ALMKC
Use the below soundex()
function directly without installing any package!直接使用下面的
soundex()
函数,无需安装任何包!
Snippet taken from package Jellyfish > _jellyfish.py
摘自Jellyfish > _jellyfish.py包的片段
Examples例子
print(soundex('kent')) # K530
print(soundex('Paul')) # P400
print(soundex('amnesty')) # A523
Code代码
import unicodedata
def soundex(s):
if not s:
return ""
s = unicodedata.normalize("NFKD", s)
s = s.upper()
replacements = (
("BFPV", "1"),
("CGJKQSXZ", "2"),
("DT", "3"),
("L", "4"),
("MN", "5"),
("R", "6"),
)
result = [s[0]]
count = 1
# find would-be replacment for first character
for lset, sub in replacements:
if s[0] in lset:
last = sub
break
else:
last = None
for letter in s[1:]:
for lset, sub in replacements:
if letter in lset:
if sub != last:
result.append(sub)
count += 1
last = sub
break
else:
if letter != "H" and letter != "W":
# leave last alone if middle letter is H or W
last = None
if count == 4:
break
result += "0" * (4 - count)
return "".join(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.