正则表达式匹配一些符号，但不包含一些符号

Question

There is paragraph, and I want to use regular expression to extract all the words inside. 有段落，我想使用正则表达式提取其中的所有单词。

a bdag agasg it's the cookies for dogs',don't you think so? the word 'wow' in english means.you hey b 097  dag final

I have tried several regexes with re.findall(regX,str), and found one that can match most words. 我用re.findall（regX，str）尝试了几种正则表达式，发现其中一个可以匹配大多数单词。

regX = "[ ,\.\?]?([a-z]+'?[a-z]?)[ ,\.\?]?"

['a', 'bdag', 'agasg', "it's", 'the', 'cookies', 'for', "dogs'", "don't", 'you', 'think', 'so', 'the', 'word', " wow' ", 'in', 'english', 'means', 'you', 'hey', 'b', 'dag', 'final'] [“ a”，“ bdag”，“ agasg”，“它”，“ the”，“ cookies”，“ for”，“ dogs”，“ do n't”，“ you”，“ think”，“ so” '，'the'，'word'，' wow' ，'in'，'english'，'means'，'you'，'hey'，'b'，'dag'，'final']

All are good except **wow'** . 除了**wow'**之外一切都很好。

I wonder if regular expression could explain the logic "it can be a comma/space/period/etc but can't be an apostrophe". 我想知道正则表达式是否可以解释逻辑“它可以是逗号/空格/句号/等，但不能是撇号”。

Can someone advise? 有人可以建议吗？

Answer 1

Try: 尝试：

[ ,\.\?']?([a-z]*('\w)?)[\' ,\.\?]?

Added another group so you'll have to select only group 1. 添加了另一个组，因此您只需要选择组1。

Answer 2

I didn't fully understand what you wanted the output to be but, try this: 我不完全了解您想要的输出是什么，但是请尝试以下操作：

[ ,\.\?]?(["-']?+[a-z]+["-']?[a-z]?)[ ,\.\?]?

using this regex lets you get the ' and " within the text. 使用此正则表达式可让您在文本中获得'和" 。

if this still was not what you wanted please let me know so I can update my answer. 如果这仍然不是您想要的，请告诉我，以便我更新我的答案。

正则表达式匹配一些符号，但不包含一些符号

问题描述

2 个解决方案

解决方案1
0 2019-03-21 09:38:44

解决方案2
0 2019-03-21 14:35:28

正则表达式匹配一些符号，但不包含一些符号

问题描述

2 个解决方案

解决方案1 0 2019-03-21 09:38:44

解决方案2 0 2019-03-21 14:35:28

解决方案1
0 2019-03-21 09:38:44

解决方案2
0 2019-03-21 14:35:28