[英]Padding ascii characters with spaces in a mix unicode-ascii string
Given a mixed string of unicode and ascii chars, eg: 给定unicode和ascii字符的混合字符串,例如:
它看灵魂塑Nike造得和学问同等重要。
The goal is to pad the ascii substrings with spaces, ie: 目标是用空格填充ascii子字符串,即:
它看灵魂塑 Nike 造得和学问同等重要。
I've tried using the ([^[:ascii:]])
regex, it looks fine in matching the substrings, eg https://regex101.com/r/FVHhU1/1 我已经尝试过使用([^[:ascii:]])
正则表达式,在匹配子字符串时看起来不错,例如https://regex101.com/r/FVHhU1/1
But in code, the substitution with ' \\1 '
is not achieving the desired output. 但是在代码中,用' \\1 '
替代无法实现所需的输出。
>>> import re
>>> patt = re.compile('([^[:ascii:]])')
>>> s = u'它看灵魂塑Nike造得和学问同等重要。'
>>> print (patt.sub(' \1 ', s))
它看灵魂塑Nike造得和学问同等重要。
How to pad ascii characters with spaces in a mix unicode-ascii string? 如何在unicode-ascii混合字符串中用空格填充ascii字符?
The pattern should be: 该模式应为:
([\x00-\x7f]+)
So you can use: 因此,您可以使用:
patt = re.compile('([\x00-\x7f]+)')
patt.sub(r' \1 ',s)
This generates: 这将产生:
>>> print(patt.sub(r' \1 ',s))
它看灵魂塑 Nike 造得和学问同等重要。
ASCII is defined as a range of characters with hex codes between 00
and 7f
. ASCII定义为十六进制代码在00
到7f
之间的字符范围。 So we define such a range as [\\x00-\\x7f]
, use +
to denote one or more , and replace the matching group with r' \\1 '
to add two spaces. 因此,我们将范围定义为[\\x00-\\x7f]
,使用+
表示一个或多个 ,然后将匹配组替换为r' \\1 '
以添加两个空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.