Python如何将标点符号与文本分开

Question

So I want to separate group of punctuation from the text with spaces. 所以我想将标点符号组与带空格的文本分开。

my_text = "!where??and!!or$$then:)"

I want to have a ! where ?? and !! or $$ then :) 我想要一个! where ?? and !! or $$ then :) ! where ?? and !! or $$ then :) ! where ?? and !! or $$ then :) as a result. ! where ?? and !! or $$ then :)结果。

I wanted something like in Javascript, where you can use $1 to get your matching string. 我想要类似Javascript的东西，您可以在其中使用$1来获取匹配的字符串。 What I have tried so far: 到目前为止我尝试过的是：

my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*', my_text)

Here my_matches is empty so I had to delete \\\\\\ from the expression: 这里my_matches为空，因此我必须从表达式中删除\\\\\\ ：

my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\^_`{|}~]*', my_text)

I have this result: 我有这个结果：

['!', '', '', '', '', '', '??', '', '', '', '!!', '', '', '$$', '', '', '', '',
':)', '']

So I delete all the redundant entry like this: 因此，我删除了所有冗余条目，如下所示：

my_matches_distinct = list(set(my_matches))

And I have a better result: 我有一个更好的结果：

['', '??', ':)', '$$', '!', '!!']

Then I replace every match by himself and space: 然后，我用他自己和空格替换所有匹配项：

for match in my_matches:
if match != '':
    my_text = re.sub(match, ' ' + match + ' ', my_text)

And of course it's not working ! 当然，这是行不通的！ I tried to cast the match as a string, but it's not working either... When I try to put directly the string to replace it's working though. 我试图将匹配项转换为字符串，但是也无法正常工作。。。当我尝试直接将字符串替换时，它仍然有效。

But I think I'm not doing it right, because I will have problems with '!' 但是我认为我做错了，因为我会遇到'！'问题。 et '!!' 等'!!' right? 对？

Thanks :) 谢谢：）

Answer 1

It is recommended to use raw string literals when defining a regex pattern. 在定义正则表达式模式时，建议使用原始字符串文字。 Besides, do not escape arbitrary symbols inside a character class, only \\ must be always escaped, and others can be placed so that they do not need escaping. 此外，请勿在字符类内转义任意符号，必须始终对\\进行转义，并且可以放置其他符号以使它们不需要转义。 Also, your regex matches an empty string - and it does - due to * . 另外，由于* ，您的正则表达式匹配一个空字符串-并且确实匹配。 Replace with + quantifier. 替换为+量词。 Besides, if you want to remove these symbols from your string, use re.sub directly. 此外，如果要从字符串中删除这些符号，请直接使用re.sub 。

import re
my_text = "!where??and!!or$$then:)"
print(re.sub(r'[]!"$%&\'()*+,./:;=#@?[\\^_`{|}~-]+', r' \g<0> ', my_text).strip())

See the Python demo 参见Python演示

Details : The []!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+ matches any 1+ symbols from the set (note that only \\ is escaped here since - is used at the end, and ] at the start of the class), and the replacement inserts a space + the whole match (the \\g<0> is the backreference to the whole match) and a space. And .strip() will remove leading/trailing whitespace after the regex finishes processing the string. 详细信息 ： []!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+匹配集合中的任何1+符号（请注意，只有\\在此处转义，因为-用于结尾，而]用于类的开头），并且替换插入空格+整个匹配项（ \\g<0>是对整个匹配项的后向引用）和一个空格。 .strip()将在正则表达式完成字符串处理后删除前导/尾随空格。

Answer 2

Use sub() method in re library. 在re库中使用sub()方法。 You can do this as follows, 您可以按照以下步骤进行操作，

import re
str = '!where??and!!or$$then:)'
print re.sub(r'([!@#%\^&\*\(\):;"\',\./\\]+)', r' \1 ', str).strip()

I hope this code should solve your problem. 我希望这段代码可以解决您的问题。 If you are obvious with regex then the regex part is not a big deal. 如果您对regex则表达式很明显，那么正则表达式部分并不重要。 Just it is to use the right function. 只是使用正确的功能。

Hope this helps! 希望这可以帮助！ Please comment if you have any queries. 如有任何疑问，请发表评论。 :) :)

References: 参考文献：

Python re library Python re库

Python如何将标点符号与文本分开

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-11-30 08:55:55

解决方案2
0 2016-11-30 09:07:09

Python如何将标点符号与文本分开

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-11-30 08:55:55

解决方案2 0 2016-11-30 09:07:09

解决方案1
1 已采纳 2016-11-30 08:55:55

解决方案2
0 2016-11-30 09:07:09