[英]How to replace compound words in a string using a dictionary?
I have a dictionary whose key:value pairs correspond to compound words and the expression i want to replace them for in a text.我有一个字典,其键:值对对应于复合词和我想在文本中替换它们的表达式。 For example let's say:
例如让我们说:
terms_dict = {'digi conso': 'digi conso', 'digi': 'digi conso', 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}
My goal is to create a function replace_terms(text, dict) that takes a text and a dictionary like this one as parameters, and returns the text after replacing the compound words.我的目标是创建一个函数 replace_terms(text, dict),它将文本和像这样的字典作为参数,并在替换复合词后返回文本。
For instance, this script:例如,这个脚本:
test_text = "i want a digi conso loan for digiconso"
print(replace_terms(test_text, terms_dict))
Should return:应该返回:
"i want a digi conso loan for digi conso"
I have tried using .replace() but for some reasons it doesn't work properly, probably because the terms to replace are composed of multiple words.我曾尝试使用 .replace() 但由于某些原因它无法正常工作,可能是因为要替换的术语由多个单词组成。
I also tried this:我也试过这个:
def replace_terms(text, terms_dict):
if len(terms_dict) > 0:
words_in = [k for k in terms_dict.keys() if k in text] # ex: words_in = [digi conso, digi, digiconso]
if len(words_in) > 0:
for w in words_in:
pattern = r"\b" + w + r"\b"
text = re.sub(pattern, terms_dict[w], text)
return text
But when applied to my text, this function returns: "i want a digi conso conso loan for digi conso" , the word conso get's doubled and I can see why (because the words_in list is created by going through the dictionary keys, and the text is not altered when one key is appended to the list).但是当应用于我的文本时,此函数返回: “i want a digi conso loan for digi conso” ,单词conso get 翻了一番,我可以看到原因(因为 words_in 列表是通过字典键创建的,而将一个键附加到列表时不会更改文本)。
Is there an efficient way to do this?有没有一种有效的方法来做到这一点?
Thanks a lot!非常感谢!
This should do it.这应该这样做。
terms_dict = { 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}
test_text = "i want a digi conso loan for digiconso"
def replace_terms(txt, dct):
dct = tuple(dct.items())
for x, y in dct:
txt = txt.replace(x, y, 1)
return txt
print(replace_terms(test_text, terms_dict))
First I get the dict pairs and get them in a easier form(tuple).首先,我得到字典对并以更简单的形式(元组)得到它们。 Then I iter and replace!
然后我迭代并替换!
Output:输出:
i want a digi conso loan for digi conso
You had to many extra replace identifiers which you did not need.您必须更换许多您不需要的额外标识符。 I also made it only replace 1 but you can change that.
我也让它只替换 1 但你可以改变它。
A rather quick and wonky way of doing this:这样做的一种相当快速和不稳定的方式:
def replace_terms(text, terms):
replacement_list = []
check = True
for term in terms:
if term in text:
for r in replacement_list:
if r[0] == text.index(term):
if len(term) > len(r[1]):
replacement_list.remove(r)
else:
check = False
if check:
replacement_list.append([text.index(term), term])
else:
check = True
for r in replacement_list:
text = text.replace(r[1], terms[r[1]])
return text
Usage:用法:
terms_dict = {
"digi conso": "digi conso",
"digi": "digi conso",
"digiconso": "digi conso",
"3xcb": "3xcb",
"3x cb": "3xcb",
"legal entity identifier": "legal entity identifier"
}
test_text = "i want a digi conso loan for digiconso"
print(replace_terms(test_text, terms_dict))
Result:结果:
i want a digi conso loan for digi conso
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.