简体   繁体   English

在unicode-ascii混合字符串中用空格填充ascii字符

[英]Padding ascii characters with spaces in a mix unicode-ascii string

Given a mixed string of unicode and ascii chars, eg: 给定unicode和ascii字符的混合字符串,例如:

它看灵魂塑Nike造得和学问同等重要。

The goal is to pad the ascii substrings with spaces, ie: 目标是用空格填充ascii子字符串,即:

它看灵魂塑 Nike 造得和学问同等重要。

I've tried using the ([^[:ascii:]]) regex, it looks fine in matching the substrings, eg https://regex101.com/r/FVHhU1/1 我已经尝试过使用([^[:ascii:]])正则表达式,在匹配子字符串时看起来不错,例如https://regex101.com/r/FVHhU1/1

But in code, the substitution with ' \\1 ' is not achieving the desired output. 但是在代码中,用' \\1 '替代无法实现所需的输出。

>>> import re
>>> patt = re.compile('([^[:ascii:]])')
>>> s = u'它看灵魂塑Nike造得和学问同等重要。'
>>> print (patt.sub(' \1 ', s))
它看灵魂塑Nike造得和学问同等重要。

How to pad ascii characters with spaces in a mix unicode-ascii string? 如何在unicode-ascii混合字符串中用空格填充ascii字符?

The pattern should be: 该模式应为:

([\x00-\x7f]+)

So you can use: 因此,您可以使用:

patt = re.compile('([\x00-\x7f]+)')
patt.sub(r' \1 ',s)

This generates: 这将产生:

>>> print(patt.sub(r' \1 ',s))
它看灵魂塑 Nike 造得和学问同等重要。

ASCII is defined as a range of characters with hex codes between 00 and 7f . ASCII定义为十六进制代码在007f之间的字符范围。 So we define such a range as [\\x00-\\x7f] , use + to denote one or more , and replace the matching group with r' \\1 ' to add two spaces. 因此,我们将范围定义为[\\x00-\\x7f] ,使用+表示一个或多个 ,然后将匹配组替换为r' \\1 '以添加两个空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM