Python正则表达式匹配-在标点符号上进行分割，但忽略某些单词

Question

Suppose I have the following sentence, 假设我有以下一句话，

Hi, my name is Dr. Who. 嗨，我叫谁博士 I'm in love with fish-fingers and custard !! 我爱上了鱼指和蛋ust！

I'm trying to capture the punctuation (except the apostrophe and hyphen) using regular expressions, but I also want to ignore certain words. 我正在尝试使用正则表达式捕获标点符号（撇号和连字符除外），但我也想忽略某些单词。 For example, I'm ignoring Dr., and so I don't want to capture the . 例如，我无视Dr.，所以我不想捕获。 in the word Dr. 一词博士

Ideally, the regex should capture the text in between the parentheses: 理想情况下，正则表达式应捕获括号之间的文本：

Hi(, )my( )name( )is( )Dr.( )Who(. )I'm( )in( )love( )with( )fish-fingers( )and( )custard( !!) Hi（，）my（）name（）is（）Dr。（）Who（。）I'm（）in（）love（）with（）fish-fingerers（）and（）custard（!!）

Note that I have a Python list that contains words like "Dr." 请注意，我有一个包含“ Dr.”之类的单词的Python列表。 that I want to ignore. 我想忽略的 I'm also using string.punctuation to get a list of punctuation characters to use in the regex. 我还使用string.punctuation来获取要在正则表达式中使用的标点符号列表。 I've tried using negative lookahead but it was still catching the "." 我曾尝试使用否定的前瞻，但仍然遇到了“。”。 in Dr. Any help appreciated! 在博士。任何帮助表示赞赏！

Answer 1

you can throw away at first all your stop words (like "Dr.") and then all letters (and digits). 您可以先丢弃所有停用词（例如“博士”），然后丢弃所有字母（和数字）。

import re

text = "Hi, my name is Dr. Who. I'm in love with fish-fingers and custard !!"
tmp = re.sub(r'[Dr.|Prof.]', '', text)
print(re.sub('[a-zA-Z0-9]*', '', tmp))

Would that work? 那行得通吗？

it would print: 它会打印：

,      '    -   !!

The output is capturing the text in between the parentheses, in your question. 输出正在捕获您问题中括号之间的文本。

Python正则表达式匹配-在标点符号上进行分割，但忽略某些单词

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-16 21:19:29

Python正则表达式匹配-在标点符号上进行分割，但忽略某些单词

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-16 21:19:29

解决方案1
0 已采纳 2019-01-16 21:19:29