RegEx是否可以匹配除标点符号之外的所有非单词？

Question

For sentences like: 对于像这样的句子：

sent = "This i$s a s[[]ample sentence.\nAnd another <<one>>.
        \nMoreover, it is 'filtered'!"

I would like to get: 我想得到：

"This is a sample sentence. And another one. Moreover, it is filtered."

Thus, I thought using re.sub should be the way to go. 因此，我认为使用re.sub应该是方法。 However, RegEx doesn't work as expected (like it pretty much always does^^). 但是，RegEx不能按预期方式工作（就像它几乎总是一样^^）。

My idea was to use \\W to match every non-word and then exclude [.,;!?] to keep the punctuation. 我的想法是使用\\W匹配每个非单词，然后排除[.,;!?]以保留标点符号。 The last RegEx I've tried was: 我尝试过的最后一个RegEx是：

re.sub(r"(\W[^\.\,\;\?\!])", "", sent)

Unfortunately, [^\\.\\,\\;\\?\\!] does match for anything that does not contain an entry of [.,;!?] , instead of simply saying 'do not match these characters literally'. 不幸的是， [^\\.\\,\\;\\?\\!]确实匹配不包含[.,;!?]条目的任何内容，而不是简单地说“从字面上不匹配这些字符”。

How can I exclude these characters from match? 如何排除这些字符？

Answer 1

The \\W needs to be integrated into the negated character class. \\W需要集成到否定字符类中。 \\W is the same as [^\\w] , so you'll end up with [^\\w.,;!?] . \\W与[^\\w] ，因此您将以[^\\w.,;!?]结尾。 You should repeat this character class, to match contiguous occurences in a single step - [^\\w.,;!?]+ . 您应该重复此字符类，以在单个步骤中匹配连续出现的内容- [^\\w.,;!?]+ 。

It seems you also want to keep spaces, so you should add them to your character class. 看来您也想保留空格，因此应将其添加到角色类中。

Reeding deeper into your question, you also want to replace newlines with a space and ! 深入探讨您的问题，您还想用空格和!替换换行符! with . 与. . 。 This makes it a multiple step solution. 这使其成为一个多步骤解决方案。 First filter out anything unwanted [^\\w.,;!? \\n]+ 首先过滤掉任何不需要的[^\\w.,;!? \\n]+ [^\\w.,;!? \\n]+ , in a next step replace \\n with [^\\w.,;!? \\n]+ ，下一步将\\n替换为 and ! 和! with . 与. . 。

RegEx是否可以匹配除标点符号之外的所有非单词？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-12-24 08:27:47

RegEx是否可以匹配除标点符号之外的所有非单词？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-12-24 08:27:47

解决方案1
2 已采纳 2016-12-24 08:27:47