[英]Regex select text excluding certain characters
I have a text like this 我有这样的文字
MyText.xyz;
MyText.abc + "ss";
I want to capture text which comes after MyText.
我想捕获MyText.
之后的文本MyText.
, but does not end with ;
,但不以结尾;
or empty space, ie I want the output like: 或空白区域,即我希望输出如下:
MyText.xyz
MyText.abc
I have created this regex: MyText.(.*(?=\\;))
in order to achieve my goal. 我创建了此正则表达式: MyText.(.*(?=\\;))
,以实现我的目标。
But, right now the text matching out from regex is MyText.xyz
and MyText.abc + "ss"
, ie the second result is incorrect. 但是,现在正则表达式匹配的文本是MyText.xyz
和MyText.abc + "ss"
,即第二个结果不正确。
You may fix it using a negated character class: 您可以使用否定的字符类来修复它:
MyText\.[^\s;]+
^^^^^^^
See the regex demo 见正则表达式演示
Regex details 正则表达式详细信息
MyText\\.
- a literal MyText.
-文字MyText.
substring (note the .
must be escaped to match a literal .
char) 子(注意.
一定要逃到一个字面匹配.
CHAR) [^\\s;]+
- a negated character class matching any 1+ chars other than whitespace ( \\s
) and ;
[^\\s;]+
-否定的字符类,与除空格( \\s
)和;
之外的任何1+个字符匹配;
char 烧焦 Use it as var pattern = @"MyText\\.[^\\s;]+";
用作var pattern = @"MyText\\.[^\\s;]+";
in C#. 在C#中。
MyText\..+?\b(?<!;)
^ ^ ^ ^ ^
|__|_|__|__|______ MyText : "MyText" literal
|_|__|__|______ \. : "." literal, escaped by "\"
|__|__|______ . : any character
|__|______ ?\b : non-greedy search up to boundary (\b)
|______ (?<!;) : not ended by ';'
Test: 测试:
$ cat sample.txt
MyText.xyz;
MyText.abc + "ss";
MyText.uuu+"yyy";
$ grep -Po 'MyText\..+?\b(?<!;)' <sample.txt
MyText.xyz
MyText.abc
MyText.uuu
Note: It's based on solution of @Wiktor Stribiżew, with looking behind added 注意:它基于@WiktorStribiżew的解决方案,并增加了后面的功能
You are using 您正在使用
MyText.(.*(?=\;))
The first mistake is the .
第一个错误是.
after MyText
, it should be \\.
在MyText
之后,应该是\\.
if you want to match a literal dot. 如果要匹配文字点。
The second half is also incorrect, you're trying to match any number of non-linebreaking-characters, followed by a ;
后半部分也不正确,您尝试匹配任意数量的非换行符,后跟一个;
, which is why you got the results you did. ,这就是您获得结果的原因。
Try this regex instead: 尝试使用此正则表达式:
MyText.[^ ;]*
The [^ ;]*
matches any character that is not a space or a ;
[^ ;]*
匹配非空格或;
任何字符;
. 。 If you also don't want tabs or linebreaks to match, you can use the following instead: 如果您也不希望制表符或换行符匹配,则可以改用以下内容:
MyText.[^\s;]*
\\s
matches any whitespace character. \\s
匹配任何空格字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.