简体   繁体   English

正则表达式:将子字符串替换为多种单词形式

[英]Regex: replace substring to multiple word forms

Some human languages are distinguished by a reach morphology and developed system of grammatical genders. 某些人类语言的特点是触手可及的形态和发达的语法性别体系。 For instance, in Slavic languages almost every adjective in single grammatical number has three differing forms, according to the amount of genders ( masculine , feminine and neuter ): 例如,在斯拉夫语中,根据性别( 男性女性中性 )的数量,几乎每个具有单个语法数的形容词都有三种不同的形式:

white <-> белый (m), белая (f), белое (n)     

In some cases it would be useful to get list of possible word forms while using regular expressions for substring replacement. 在某些情况下,使用正则表达式替换子字符串时获取可能的单词形式列表将很有用。

Now I'm curious, if Python (or any other scripting language) allows to do things like below ( WARNING : the code snippets below is a Python-like pseudocode , but not a working Python code): 现在我很好奇,如果Python(或任何其他脚本语言)允许执行以下操作( 警告 :下面的代码段是类似Python的伪代码 ,但不是有效的Python代码):

# I would like to handle russian genders like that:  
>>> re.sub(r"Бел.", r"Бел[ый|ая|ое]", "Бел. Берель")
["Белый Берель", "Белая Берель", "Белое Берель"]

# A very artifical example for those who prefer latin:
>>> re.sub(r"Go.", r"Go[ld|lden]", "Go. Ochre")
["Gold Ochre", "Golden Ochre"] 

So can I use regular expressions to get list of matching combinations of words? 那么我可以使用正则表达式获取匹配单词组合的列表吗?

No, but you can iterate over the list of suffixes with a list comprehension like this (for Python 2): 否,但是您可以使用以下列表理解来迭代后缀列表(对于Python 2):

>>> suffixes = u'ый|ая|ое'.split('|')
>>> suffixes
[u'\u044b\u0439', u'\u0430\u044f', u'\u043e\u0435']
>>> replacements = [re.sub(u"Бел.", u"Бел%s" % s, u"Бел. Берель") for s in suffixes]
>>> replacements
[u'\u0411\u0435\u043b\u044b\u0439 \u0411\u0435\u0440\u0435\u043b\u044c', u'\u0411\u0435\u043b\u0430\u044f \u0411\u0435\u0440\u0435\u043b\u044c', u'\u0411\u0435\u043b\u043e\u0435 \u0411\u0435\u0440\u0435\u043b\u044c']
>>> for s in replacements:
...     print s
... 
Белый Берель
Белая Берель
Белое Берель

It's a somewhat clearer in Python 3: 在Python 3中,这一点更加清晰:

>>> suffixes = u'ый|ая|ое'.split('|')
>>> suffixes
['ый', 'ая', 'ое']
>>> >>> suffixes = u'ый|ая|ое'.split('|')
>>> suffixes
[u'\u044b\u0439', u'\u0430\u044f', u'\u043e\u0435']
>>> replacements = [re.sub("Бел.", "Бел%s" % s, "Бел. Берель") for s in suffixes]
>>> replacements
['Белый Берель', 'Белая Берель', 'Белое Берель']

Thanks for your latin example! 感谢您的拉丁语示例!

I'd use a loop for all possible conditions: 我会为所有可能的条件使用循环:

import re 

str = "Go. Ochre"

find_list = [r'Go.']
replace_list = ['Gold', 'Golden']

for value in find_list:
    for item in replace_list:
        print re.sub(value, item, str)

I'm not sure if thats really what you want - and i don't know much about the efficiency of this method. 我不确定这是否真的是您想要的-而且我对该方法的效率了解不多。

But it's very readable code, simple to maintain - and could easily be written as a re-usable function for general purpose. 但这是非常易读的代码,易于维护-可以很容易地将其编写为通用的可重用函数。


...when it comes to this example, it would be better to just replace() the strings - no need for a regex. ...当涉及到此示例时,最好只replace()字符串-不需要正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM