简体   繁体   English

正则表达式删除字符和提供的单词

[英]Regex to remove characters and supplied words

I need a regex that will allow only alphanumeric characters AND also remove certain full-words. 我需要一个只允许字母数字字符的正则表达式,并删除某些全字。

Example: 例:

Input string: this-is-johny-bravo's-grand-dad 输入字符串: 这是johny-bravo的祖父

Result string: johny-bravos-dad 结果字符串: johny-bravos-dad

Words/characters to replace by an empty string: this,is,',grand 要用空字符串替换的单词/字符:this,is,',grand

Here is what I have so far: 这是我到目前为止:

var input = "this-is-johny-bravo's-grand-dad";
var regex = new Regex(@"([^a-z0-9\-][\b(this|is|grand)\b]?)");
var result = regex.Replace(input, "");

The result seems to not have the apostrophe but unfortunately still includes the rejected full-words. 结果似乎没有撇号,但不幸的是仍然包括被拒绝的全字。

您还需要将字符类添加到交替:

new Regex(@"\b(this|is|grand)\b-?|[^a-z0-9-]");

Your expression is too complicated. 你的表情太复杂了。 Try 尝试

\b(this|is|grand|')\b-?

Also, and that is the root cause of your problem: Character classes are not for alternation. 此外,这是您的问题的根本原因:字符类不是为了交替。 This [\\b(this|is|grand)\\b] is syntactically equivalent to this [()adghinrst|] . 这个[\\b(this|is|grand)\\b]在句法上等同于这个[()adghinrst|]

Thinking about it, you probably want this: 考虑一下,你可能想要这个:

(\b(this|is|grand)\b|[^a-z0-9-])-?

Break-down: 分解:

(                          # group 1
    \b(this|is|grand)\b    #   any of these words
    |                      #   or 
    [^a-z0-9-]             #   any character except one of these
)                          # end group 1
-?                         # optional dash at the end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM