![](/img/trans.png)
[英]AssertionError: The shape's body must be added to the space before (or at the same time) as the shape
[英]Add a space after a word if it's at the beginning of a string or if it's after one or more spaces, and at the same time it must be at end or before \n
import re
line = "treinta y un" #example 1
line = "veinti un " #example 2
line = "un" #example 3
line = "un " #example 4
line = "uno" #example 5
line = "treinta yun" #example 6
line = "treinta y unghhg" #example 7
re_for_identificate_1 = "(?<!^)un"
re_for_identificate_2 = " un"
line = re.sub(re_for_identificate_2, " un ", line)
line = re.sub(re_for_identificate_1, "un ", line)
print(repr(line))
如何从这些输入中获得这些输出?
"treinta y un " #for example 1
"veinti un " #for example 2
"un " #for example 3
"un " #for example 4
"uno" #for example 5
"treinta yun" #for example 6
"treinta y unghhg" #for example 7
请注意,对于示例 4、5、6 和 7,正则表达式不应进行任何更改,因为在单词之后已经放置了一个空格,或者因为在"uno"
的情况下,单词"un"
不在末尾句子的开头,或者在"treinta yun"
的情况下,substring "un"
前面没有一个或多个空格。
我不确定你需要正则表达式。 以下代码似乎可以实现您想要的。
执行三项检查:
在这里,我将逻辑包装在列表理解中以进行演示。
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg"]
result = [ line+" " if (isinstance(line, str)
and (line[-2:] == "un")
and (line.split()[-1] == "un"))
else line
for line in lines ]
for line in result:
print(f"'{line}'")
Output:
'treinta y un '
'veinti un '
'un '
'un '
'uno'
'treinta yun'
'treinta y unghhg'
如果要使用正则表达式,可以使用\bun$
,它会检查字符串中的最后一个完整单词是否为un
,并且字符串中后面没有任何内容。 如果是这种情况,则在字符串末尾添加一个空格:
import re
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg"]
result = [re.sub(r'\bun$', 'un ', line) for line in lines]
Output:
[
'treinta y un ',
'veinti un ',
'un ',
'un ',
'uno',
'treinta yun',
'treinta y unghhg'
]
如果你在你的代码中声明line =
,你每次都会覆盖它。
使用(?<!^)un
断言字符串的开头不是直接在左侧。
如果您还想排除#un
的匹配项,您可以使用(?<\S)
代替断言左侧的空白边界。
要确保模式位于字符串的末尾,您可以使用锚$
代码示例使用单行,但如果您想在多行时进行替换,则必须将多行标志re.MULTILINE
与re.sub一起使用。
例子
import re
pattern = r"(?<!\S)un$"
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg", "#un"]
print([re.sub(pattern, 'un ', line) for line in lines])
Output
[
'treinta y un ',
'veinti un ',
'un ',
'un ',
'uno',
'treinta yun',
'treinta y unghhg',
'#un'
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.