简体   繁体   English

正则表达式选择不包含某些字符的文本

[英]Regex select text excluding certain characters

I have a text like this 我有这样的文字

MyText.xyz;
MyText.abc + "ss";

I want to capture text which comes after MyText. 我想捕获MyText.之后的文本MyText. , but does not end with ; ,但不以结尾; or empty space, ie I want the output like: 或空白区域,即我希望输出如下:

MyText.xyz
MyText.abc

I have created this regex: MyText.(.*(?=\\;)) in order to achieve my goal. 我创建了此正则表达式: MyText.(.*(?=\\;)) ,以实现我的目标。

But, right now the text matching out from regex is MyText.xyz and MyText.abc + "ss" , ie the second result is incorrect. 但是,现在正则表达式匹配的文本是MyText.xyzMyText.abc + "ss" ,即第二个结果不正确。

You may fix it using a negated character class: 您可以使用否定的字符类来修复它:

MyText\.[^\s;]+
        ^^^^^^^ 

See the regex demo 正则表达式演示

Regex details 正则表达式详细信息

  • MyText\\. - a literal MyText. -文字MyText. substring (note the . must be escaped to match a literal . char) 子(注意.一定要逃到一个字面匹配. CHAR)
  • [^\\s;]+ - a negated character class matching any 1+ chars other than whitespace ( \\s ) and ; [^\\s;]+ -否定的字符类,与除空格( \\s )和;之外的任何1+个字符匹配; char 烧焦

Use it as var pattern = @"MyText\\.[^\\s;]+"; 用作var pattern = @"MyText\\.[^\\s;]+"; in C#. 在C#中。

MyText\..+?\b(?<!;)
   ^  ^ ^  ^  ^
   |__|_|__|__|______ MyText : "MyText" literal
      |_|__|__|______ \.     : "." literal, escaped by "\"
        |__|__|______ .      : any character
           |__|______ ?\b    : non-greedy search up to boundary (\b)
              |______ (?<!;) : not ended by ';'

Test: 测试:

$ cat sample.txt
MyText.xyz;
MyText.abc + "ss";
MyText.uuu+"yyy";

$ grep -Po 'MyText\..+?\b(?<!;)' <sample.txt
MyText.xyz
MyText.abc
MyText.uuu

Note: It's based on solution of @Wiktor Stribiżew, with looking behind added 注意:它基于@WiktorStribiżew的解决方案,并增加了后面的功能

You are using 您正在使用

MyText.(.*(?=\;))

The first mistake is the . 第一个错误是. after MyText , it should be \\. MyText之后,应该是\\. if you want to match a literal dot. 如果要匹配文字点。

The second half is also incorrect, you're trying to match any number of non-linebreaking-characters, followed by a ; 后半部分也不正确,您尝试匹配任意数量的非换行符,后跟一个; , which is why you got the results you did. ,这就是您获得结果的原因。

Try this regex instead: 尝试使用此正则表达式:

MyText.[^ ;]*

The [^ ;]* matches any character that is not a space or a ; [^ ;]*匹配非空格或;任何字符; . If you also don't want tabs or linebreaks to match, you can use the following instead: 如果您也不希望制表符或换行符匹配,则可以改用以下内容:

MyText.[^\s;]*

\\s matches any whitespace character. \\s匹配任何空格字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM