[英]Python - Get matched string percentage along with the string
I want to match a string to certain keywords and get the percentage and the substring that was matched to my keyword.我想将字符串与某些关键字匹配并获取与我的关键字匹配的百分比和 substring。 Eg I have a list of keywords例如,我有一个关键字列表
keywords = ['Projekt-Nr.:', 'Projektbezeichnung:', 'Anlagenklassifizierung:', 'Arbeiten / Gewerk:']
and some unknown text eg和一些未知的文字,例如
s = "Projekthezeichnung: —_[H- Kloster Eig i Krankenhaus"
I want my keywords to be searched in this string so that it returns me the partially matched string.我希望在此字符串中搜索我的关键字,以便它返回部分匹配的字符串。
'Projektbezeichnung:' should match 'Projekthezeichnung:' with over 95% accuracy (I am already using cdifflib for that) but cdifflib doesn't return the substring my keyword was matched with. 'Projektbezeichnung:' 应该匹配 'Projekthezeichnung:' 超过 95% 的准确率(我已经为此使用 cdifflib),但 cdifflib 不返回与我的关键字匹配的 substring。
How can I get the unknown substring that my keyword was partially matched with?如何获得与我的关键字部分匹配的未知 substring?
Any help would be quite useful, thanks!任何帮助都会非常有用,谢谢!
difflib
's get_close_matches
seems suitable: difflib
的get_close_matches
似乎很合适:
from difflib import get_close_matches as gcm
keywords = ['Projekt-Nr.:', 'Projektbezeichnung:', 'Anlagenklassifizierung:', 'Arbeiten / Gewerk:']
unk_text = "Projekthezeichnung: —_[H- Kloster Eig i Krankenhaus"
words = unk_text.split()
result = [gcm(kw, words, n=len(words), cutoff=0.8) for kw in keywords]
# [[], ['Projekthezeichnung:'], [], []]
Each sublist of the result
list contains "close" matches to the corresponding keyword. result
列表的每个子列表都包含与相应关键字的“接近”匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.