python中的正则表达式需要保留特殊字符

Question

Below is my unclean text string 下面是我的不干净的文本字符串

text = 'this/r/n/r/nis a non-U.S disclosures/n/n/r/r analysis agreements disclaimer./r/n/n/nPlease keep it confidential'

below is the regexp i'm using: 以下是我正在使用的正则表达式：

 ' '.join(re.findall(r'\b(\w+)\b', text))

my output is: 我的输出是：

'this is a non US disclosures analysis agreements disclaimer. Please keep it confidential'

my expected output is: 我的预期输出是：

 'this is a non-U.S disclosures analysis agreements disclaimer. Please keep it confidential'

I need to retain special characters and space between the words, there should be exactly one space. 我需要在单词之间保留特殊字符和空格，应该恰好有一个空格。 can anyone help me to alter my regexp? 谁能帮我改变我的正则表达式？

Answer 1

Hope this works for you! 希望这对您有用！

str = 'this/r/n/r/nis a non-US disclosures/n/n/r/r analysis agreements disclaimer./r/n/n/nPlease keep it confidential' str ='此/ r / n / r / nis非美国披露/ n / n / r / r分析协议免责声明。/r/n/n/n请对其保密”

val = re.sub(r'(/.?)', " ", str); val = re.sub（r'（/。？）'，“”，str）; val1 = re.sub(r'\\s+', " ", val) print(val1) val1 = re.sub（r'\\ s +'，“”，val）print（val1）

Answer 2

Use a more specific word barrier than \\b ($ which marks the end of a string can't be placed inside square brackets so you have to make the or explicit in $|\\n|\\r| and the ?= is a non consuming look ahead much like \\b), also safer here is using a non greedy non empty accumulator (the + sign makes it non empty and the question mark makes it non greedy): 使用比\\ b（$表示字符串的末尾不能放在方括号内，因此您必须在$ | \\ n | \\ r |中使用或显式，而？=是非像\\ b一样使用前瞻，这里也更安全的是使用非贪婪非空累加器（+号使其成为非空，问号使其成为非贪婪）：

re.findall(r'[^\n\r ]+?(?=$|\n|\r| )', text)

['this', 'is', 'a', 'non-U.S', 'disclosures', 'analysis', 'agreements', 'disclaimer.', 'Please', 'keep', 'it', 'confidential'] [“此”，“是”，“一个”，“非美国”，“披露”，“分析”，“协议”，“免责声明”，“请”，“保留”，“它”， '机密']

python中的正则表达式需要保留特殊字符

问题描述

2 个解决方案

解决方案1
1 2018-02-02 05:23:27

解决方案2
0 2018-01-29 11:00:01

python中的正则表达式需要保留特殊字符

问题描述

2 个解决方案

解决方案1 1 2018-02-02 05:23:27

解决方案2 0 2018-01-29 11:00:01

解决方案1
1 2018-02-02 05:23:27

解决方案2
0 2018-01-29 11:00:01