[英]Python how to separate punctuation from text
So I want to separate group of punctuation from the text with spaces. 所以我想将标点符号组与带空格的文本分开。
my_text = "!where??and!!or$$then:)"
I want to have a ! where ?? and !! or $$ then :)
我想要一个
! where ?? and !! or $$ then :)
! where ?? and !! or $$ then :)
! where ?? and !! or $$ then :)
as a result. ! where ?? and !! or $$ then :)
结果。
I wanted something like in Javascript, where you can use $1
to get your matching string. 我想要类似Javascript的东西,您可以在其中使用
$1
来获取匹配的字符串。 What I have tried so far: 到目前为止我尝试过的是:
my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*', my_text)
Here my_matches is empty so I had to delete \\\\\\
from the expression: 这里my_matches为空,因此我必须从表达式中删除
\\\\\\
:
my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\^_`{|}~]*', my_text)
I have this result: 我有这个结果:
['!', '', '', '', '', '', '??', '', '', '', '!!', '', '', '$$', '', '', '', '',
':)', '']
So I delete all the redundant entry like this: 因此,我删除了所有冗余条目,如下所示:
my_matches_distinct = list(set(my_matches))
And I have a better result: 我有一个更好的结果:
['', '??', ':)', '$$', '!', '!!']
Then I replace every match by himself and space: 然后,我用他自己和空格替换所有匹配项:
for match in my_matches:
if match != '':
my_text = re.sub(match, ' ' + match + ' ', my_text)
And of course it's not working ! 当然,这是行不通的! I tried to cast the match as a string, but it's not working either... When I try to put directly the string to replace it's working though.
我试图将匹配项转换为字符串,但是也无法正常工作。。。当我尝试直接将字符串替换时,它仍然有效。
But I think I'm not doing it right, because I will have problems with '!' 但是我认为我做错了,因为我会遇到'!'问题。 et '!!'
等'!!' right?
对?
Thanks :) 谢谢 :)
It is recommended to use raw string literals when defining a regex pattern. 在定义正则表达式模式时,建议使用原始字符串文字。 Besides, do not escape arbitrary symbols inside a character class, only
\\
must be always escaped, and others can be placed so that they do not need escaping. 此外,请勿在字符类内转义任意符号,必须始终对
\\
进行转义,并且可以放置其他符号以使它们不需要转义。 Also, your regex matches an empty string - and it does - due to *
. 另外,由于
*
,您的正则表达式匹配一个空字符串-并且确实匹配。 Replace with +
quantifier. 替换为
+
量词。 Besides, if you want to remove these symbols from your string, use re.sub
directly. 此外,如果要从字符串中删除这些符号,请直接使用
re.sub
。
import re
my_text = "!where??and!!or$$then:)"
print(re.sub(r'[]!"$%&\'()*+,./:;=#@?[\\^_`{|}~-]+', r' \g<0> ', my_text).strip())
See the Python demo 参见Python演示
Details : The []!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+
matches any 1+ symbols from the set (note that only \\
is escaped here since -
is used at the end, and ]
at the start of the class), and the replacement inserts a space + the whole match (the \\g<0>
is the backreference to the whole match) and a space. And .strip()
will remove leading/trailing whitespace after the regex finishes processing the string. 详细信息 :
[]!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+
匹配集合中的任何1+符号(请注意,只有\\
在此处转义,因为-
用于结尾,而]
用于类的开头),并且替换插入空格+整个匹配项( \\g<0>
是对整个匹配项的后向引用)和一个空格。 .strip()
将在正则表达式完成字符串处理后删除前导/尾随空格。
Use sub()
method in re
library. 在
re
库中使用sub()
方法。 You can do this as follows, 您可以按照以下步骤进行操作,
import re
str = '!where??and!!or$$then:)'
print re.sub(r'([!@#%\^&\*\(\):;"\',\./\\]+)', r' \1 ', str).strip()
I hope this code should solve your problem. 我希望这段代码可以解决您的问题。 If you are obvious with
regex
then the regex part is not a big deal. 如果您对
regex
则表达式很明显,那么正则表达式部分并不重要。 Just it is to use the right function. 只是使用正确的功能。
Hope this helps! 希望这可以帮助! Please comment if you have any queries.
如有任何疑问,请发表评论。 :)
:)
References: 参考文献:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.