简体   繁体   English

正则表达式删除所有标点符号,除了'

[英]Regex to strip all punctuation except '

I'd like to strip all punctuation from a block of text that I import except for ', such as the ' in doesn't. 我想从我导入的文本块中删除所有标点符号,除了',例如'in not not。

I currently have 我现在有

$words = preg_replace('/[^az]+/i', '', $words); $ words = preg_replace('/ [^ az] + / i','',$ words);

Which strips all the punctuation, but I'm unsure of how to include '. 这剥去了所有的标点符号,但我不确定如何包含'。

How can I achieve this? 我怎样才能做到这一点?

try it so 试试吧

preg_replace( '/[^\w\']+|\'(?!\w)|(?<!\w)\'/', '', $words )

this should replace all non-letters and also single apostrophs outside a word 这应该替换一个单词之外的所有非字母和单个撇号

untested yet, please let me know if it works 未经测试,如果有效,请告诉我

update 更新

to remove numbers, too, just use this regex 要删除数字,只需使用此正则表达式

/[^\w\']+|\'(?!\w)|(?<!\w)\'|\d+/

just added \\d+ , so numbers matches and will be removed 刚添加\\d+ ,所以数字匹配并将被删除

To remove punctuation characters with unicode property, do: 要使用unicode属性删除标点符号,请执行以下操作:

 preg_replace('/\p{Punctuation}+/u', '', $words);

or 要么

 preg_replace('/\p{P}+/u', '', $words);

To remove all punctuation except single quote: 要删除除单引号之外的所有标点符号:

 preg_replace("/[^\P{P}']+/u", '', $words);

Have a look at here . 看看这里

You can use 您可以使用

(?!')\p{P}

to match any punctuation except an apostrophe. 匹配除撇号之外的任何标点符号。 Eg 例如

preg_replace('/(?!\')\p{P}/gu', '', $str);
/(?!'\b)[[:punct:]] ?/

This matches any punctuation character unless it's an apostrophe followed by a character (ie a word boundary, which implies a character). 这匹配任何标点符号,除非它是一个撇号后跟一个字符(即一个字边界,暗示一个字符)。

See http://rubular.com/r/VJ0J5c25vc http://rubular.com/r/VJ0J5c25vc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM