[英]Numbering a certain word in a text
我想为文本中的某些单词提供参考编号(数字)格式。
通过使用下面的代码,我确实得到了一些正确的 output。 但是,当形容词有相同的单词或单词有附录时,它就不起作用了。
我能想到的所有边缘情况都是这两个,当有相同的单词包括形容词,然后如果一个单词在文本中有附录,则能够匹配字典中的单词。
试过这个,
import re
text = "This is a first sample and this is a second sample."
words_to_number = {"first sample": 1, "second sample": 2, "sample": 3}
for keyword, number in words_to_number.items():
pattern = r"\b"+keyword+r"\b"
text = re.sub(pattern, keyword+" ("+str(number)+")", text)
print(text)
明白了,这是第一个样本 (3) (1),这是第二个样本 (3) (2)。
而不是,这是第一个样本 (1),这是第二个样本 (2)。
这里的问题是您在匹配关键字后将其放回原处,因此仍然可以匹配作为关键字前缀的后续关键字(可能)。
考虑一下当您不将匹配的关键字放回去时会发生什么:
import re
text = "This is a first sample and this is a second sample."
words_to_number = {"first sample": 1, "second sample": 2, "sample": 3}
for keyword, number in words_to_number.items():
pattern = rf"\b{keyword}\b"
text = re.sub(pattern, f"({number})", text)
print(text) # This is a (1) and this is a (2).
要解决此问题,您可以使用数字作为占位符并将每个关键字放回第二个 for 循环中:
import re
text = "This is a first sample and this is a second sample."
words_to_number = {"first sample": 1, "second sample": 2, "sample": 3}
for keyword, number in words_to_number.items():
pattern = rf"\b{keyword}\b"
text = re.sub(pattern, f"({number})", text)
print(text) # This is a (1) and this is a (2).
for keyword, number in words_to_number.items():
pattern = rf"\({number}\)"
text = re.sub(pattern, f"{keyword} ({number})", text)
print(text) # This is a first sample (1) and this is a second sample (2).
作为单个语句,使用|
制作单个正则表达式分隔不同的正则表达式并使用re.sub
的回调选项。
import re
text = "This is a first sample and this is a second sample."
words_to_number = {"first sample": 1, "second sample": 2, "sample": 3}
regex = r"|".join("({})".format(k) for k in words_to_number)
text_new = re.sub(regex, lambda m: r"{} ({})".format(
m.group(), words_to_number[m.group()]) , text)
print(text_new)
我个人会完全放弃正则表达式并使用 Fractalism 的方法:
text = "This is a first sample and this is a second sample."
words_to_number = {"first sample": 1, "second sample": 2, "sample": 3}
for word, number in words_to_number.items():
text = text.replace(word, str(number))
for word, number in words_to_number.items():
text = text.replace(str(number), f"{word} ({number})")
在这种情况下,正则表达式似乎有点矫枉过正,因为您只匹配没有其他模式的预定义字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.