简体   繁体   English

Python 用正则表达式匹配复杂字符串

[英]Python Matching Complex String with Regex

I have the following test string:我有以下测试字符串:

test_str = `It isn't directed at all,' said the White Rabbit;

My current regular expression uses re.sub to filter out the punctuation so that I can do my own operations.我当前的正则表达式使用re.sub过滤掉标点符号,以便我可以进行自己的操作。

My current regex is re.sub(r"[^A-Za-z0-9'\\s]", '', test_str)我当前的正则表达式是re.sub(r"[^A-Za-z0-9'\\s]", '', test_str)

The output from above is:上面的输出是:

['It', "isn't", 'directed', 'at', "all'", 'said', 'the', 'White', 'Rabbit']

The error can be seen at all' when it is suppose to be storing all only.错误可以被视为all'当它是假设是存储all只。

How do you store words with 's and also ignore ' that comes after a punctuation?你如何用's存储单词并忽略标点符号后面的' In this case, all,' .在这种情况下, all,'

Try the following:请尝试以下操作:

import re
test_str = "`It isn't directed at all,' said the White Rabbit;"
a = re.sub(r"[^A-Za-z0-9'\s]", '', test_str)
a = re.sub(r"'[ ]", ' ', a)
print(a)

Try using this regular expression:尝试使用这个正则表达式:

print(re.sub('["!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~''](?!\w+)', '', test_str))

Output:输出:

It isn't directed at all said the White Rabbit

Here are other solutions这是其他解决方案

re.sub("\'[^\w]",' ', test_str)
re.sub("\'[\s]",' ', test_str)
re.sub("\'(?!\w)",'', test_str)
re.sub("\'(?=\s)",'', test_str)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM