简体   繁体   English

Python如何将标点符号与文本分开

[英]Python how to separate punctuation from text

So I want to separate group of punctuation from the text with spaces. 所以我想将标点符号组与带空格的文本分开。

my_text = "!where??and!!or$$then:)"

I want to have a ! where ?? and !! or $$ then :) 我想要一个! where ?? and !! or $$ then :) ! where ?? and !! or $$ then :) ! where ?? and !! or $$ then :) as a result. ! where ?? and !! or $$ then :)结果。

I wanted something like in Javascript, where you can use $1 to get your matching string. 我想要类似Javascript的东西,您可以在其中使用$1来获取匹配的字符串。 What I have tried so far: 到目前为止我尝试过的是:

my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*', my_text)

Here my_matches is empty so I had to delete \\\\\\ from the expression: 这里my_matches为空,因此我必须从表达式中删除\\\\\\

my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\^_`{|}~]*', my_text)

I have this result: 我有这个结果:

['!', '', '', '', '', '', '??', '', '', '', '!!', '', '', '$$', '', '', '', '',
':)', '']

So I delete all the redundant entry like this: 因此,我删除了所有冗余条目,如下所示:

my_matches_distinct = list(set(my_matches))

And I have a better result: 我有一个更好的结果:

['', '??', ':)', '$$', '!', '!!']

Then I replace every match by himself and space: 然后,我用他自己和空格替换所有匹配项:

for match in my_matches:
if match != '':
    my_text = re.sub(match, ' ' + match + ' ', my_text)

And of course it's not working ! 当然,这是行不通的! I tried to cast the match as a string, but it's not working either... When I try to put directly the string to replace it's working though. 我试图将匹配项转换为字符串,但是也无法正常工作。。。当我尝试直接将字符串替换时,它仍然有效。

But I think I'm not doing it right, because I will have problems with '!' 但是我认为我做错了,因为我会遇到'!'问题。 et '!!' 等'!!' right? 对?

Thanks :) 谢谢 :)

It is recommended to use raw string literals when defining a regex pattern. 在定义正则表达式模式时,建议使用原始字符串文字。 Besides, do not escape arbitrary symbols inside a character class, only \\ must be always escaped, and others can be placed so that they do not need escaping. 此外,请勿在字符类内转义任意符号,必须始终对\\进行转义,并且可以放置其他符号以使它们不需要转义。 Also, your regex matches an empty string - and it does - due to * . 另外,由于* ,您的正则表达式匹配一个空字符串-并且确实匹配。 Replace with + quantifier. 替换为+量词。 Besides, if you want to remove these symbols from your string, use re.sub directly. 此外,如果要从字符串中删除这些符号,请直接使用re.sub

import re
my_text = "!where??and!!or$$then:)"
print(re.sub(r'[]!"$%&\'()*+,./:;=#@?[\\^_`{|}~-]+', r' \g<0> ', my_text).strip())

See the Python demo 参见Python演示

Details : The []!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+ matches any 1+ symbols from the set (note that only \\ is escaped here since - is used at the end, and ] at the start of the class), and the replacement inserts a space + the whole match (the \\g<0> is the backreference to the whole match) and a space. And .strip() will remove leading/trailing whitespace after the regex finishes processing the string. 详细信息[]!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+匹配集合中的任何1+符号(请注意,只有\\在此处转义,因为-用于结尾,而]用于类的开头),并且替换插入空格+整个匹配项( \\g<0>是对整个匹配项的后向引用)和一个空格。 .strip()将在正则表达式完成字符串处理后删除前导/尾随空格。

Use sub() method in re library. re库中使用sub()方法。 You can do this as follows, 您可以按照以下步骤进行操作,

import re
str = '!where??and!!or$$then:)'
print re.sub(r'([!@#%\^&\*\(\):;"\',\./\\]+)', r' \1 ', str).strip()

I hope this code should solve your problem. 我希望这段代码可以解决您的问题。 If you are obvious with regex then the regex part is not a big deal. 如果您对regex则表达式很明显,那么正则表达式部分并不重要。 Just it is to use the right function. 只是使用正确的功能。

Hope this helps! 希望这可以帮助! Please comment if you have any queries. 如有任何疑问,请发表评论。 :) :)


References: 参考文献:

Python re library Python re库

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM