简体   繁体   English

python正则表达式在标点符号和字母之间插入空格

[英]python regex inserting a space between punctuation and letters

I assume the best way to do this is with regex but I do not know how to do it. 我认为最好的方法是使用正则表达式,但我不知道该怎么做。 I am trying to parse a string and put a space between letters and punctuation only. 我试图解析一个字符串,并在字母和标点之间放置一个空格。 I want to keep punctuation marks together. 我想把标点符号放在一起。 As an example if I have the string 举个例子,如果我有字符串

"yes!!!" “是!!!”

I want to end up with 我想结束

"yes", "!!!". “是”,“!!!”。

If I have the string 如果我有字符串

!!!N00bs, !N00bs,

I want to end up with 我想结束

"!!!", "N00bs" “!!!”,“N00bs”

Is this possible? 这可能吗? What is the best way to do this? 做这个的最好方式是什么? Right now I am parsing each letter and it a silly way of doing it. 现在我正在解析每个字母,这是一种愚蠢的方式。

Thanks for the help. 谢谢您的帮助。

something like this: 这样的事情:

txt = re.sub( r'([a-zA-Z])([,.!])', r'\1 \2', '!!!this, .is, .a .test!!!' )

you can switch the order for the other direction 你可以切换另一个方向的顺序

re.sub( r'([,.!])([a-zA-Z])', r'\1 \2', txt )

probably you can also make it work in one regex as well 也许你也可以在一个正则表达式中使它工作

If you just want to add a space maybe use replace? 如果你只是想添加一个空格,可以使用替换?

x = x.replace('!',' ')

You may have to use more replace's to remove spaces between punctuation and punctuation. 您可能必须使用更多替换来删除标点符号和标点符号之间的空格。

I'd use: 我用的是:

(.+)\b(.+)

It works for both yes!!! 它既适用yes!!! and !!!N00bs !!!N00bs

Explanation: 说明:

The regular expression:

(?-imsx:(.+)\b(.+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM