简体   繁体   English

如何使用正则表达式查找单词的完全匹配项,但忽略带有介词的相同文本? C#

[英]How to use regex to find exact match of a word but ignoring the same text with preposition? C#

hi i've been working with DXF files and i got some trouble for regular expression. 嗨,我一直在处理DXF文件,我在使用正则表达式时遇到了一些麻烦。 i have some text like this 我有一些这样的文字

   BODY
   123
   abc
   GR-BODY
   attrib
   AcdbLine

and i've write some regular expression that should be work but clearly i still need some help for this regular expression 并且我已经写了一些应该起作用的正则表达式,但是显然我仍然需要一些帮助来解决这个正则表达式

here is my code 这是我的代码

string[] tmp = Regex.Split(originalString, @"(3DFACE|3DSOLID|ACAD_PROXY_ENTITIY|ARC|ATTDEF|ATTRIB|BODY|CIRCLE|DIMENSION|ELLIPSE|HATCH|HELIX|IMAGE|INSERT|LEADER|LIGHT|LWPOLYLINE|MLINE|MLEADERSTYLE|MLEADER|MTEXT|OLEFRAME|OLE2FRAME|POINT|POLYLINE|RAY|REGION|SEQEND|SHAPE|SOLID|SPLINE|SUN|SURFACE|TABLE|TEXT|TOLERANCE|TRACE|UNDERLAY|VERTEX|VIEWPORT|WIPEOUT|XLINE|LINE)", RegexOptions.None);

and i would like just to catch the BODY text but the GR-BODY still included, how to exclude the GR-BODY? 并且我只想捕获BODY文本,但GR-BODY仍然包括在内,如何排除GR-BODY? thanks 谢谢

EDIT 1 i'm sorry i look for the wrong code earlier 编辑1对不起,我早些时候查找了错误的代码

umm i want to the output like this 嗯,我想要这样的输出

tmp[0] = BODY
tmp[1] = 123\nabc\nGR-LINE\nattrib\nAcdbLine

since my code only been able to make it like this 因为我的代码只能做到这样

tmp[0] = BODY
tmp[1] = 123\nabc\nGR-
tmp[2] = BODY\nattrib\nAcdbLine

That regex statement should work. 该正则表达式语句应该起作用。 Try using Regex.Matches to return a MatchCollection instead. 尝试使用Regex.Matches返回一个MatchCollection代替。

   MatchCollection mc = Regex.Matches(originalString, @"(3DFACE|3DSOLID|ACAD_PROXY_ENTITIY|ARC|ATTDEF|ATTRIB|BODY|CIRCLE|DIMENSION|ELLIPSE|HATCH|HELIX|IMAGE|INSERT|LEADER|LIGHT|LWPOLYLINE|MLINE|MLEADERSTYLE|MLEADER|MTEXT|OLEFRAME|OLE2FRAME|POINT|POLYLINE|RAY|REGION|SEQEND|SHAPE|SOLID|SPLINE|SUN|SURFACE|TABLE|TEXT|TOLERANCE|TRACE|UNDERLAY|VERTEX|VIEWPORT|WIPEOUT|XLINE|LINE)", RegexOptions.None);
   string[] tmp = mc.Cast<Match>().Select(m => m.Value).ToArray();

If your words are always from the start to the end of the row, then tell the pattern this: 如果您的单词总是从行的开头到结尾,那么请告诉模式:

string[] tmp = Regex.Split(originalString, @"^(3DFACE|3DSOLID|ACAD_PROXY_ENTITIY|ARC|ATTDEF|ATTRIB|BODY|CIRCLE|DIMENSION|ELLIPSE|HATCH|HELIX|IMAGE|INSERT|LEADER|LIGHT|LWPOLYLINE|MLINE|MLEADERSTYLE|MLEADER|MTEXT|OLEFRAME|OLE2FRAME|POINT|POLYLINE|RAY|REGION|SEQEND|SHAPE|SOLID|SPLINE|SUN|SURFACE|TABLE|TEXT|TOLERANCE|TRACE|UNDERLAY|VERTEX|VIEWPORT|WIPEOUT|XLINE|LINE)$", RegexOptions.Multiline);

This should give you the output you expect. 这应该给您期望的输出。

^ Matches the start of the row when the Multiline option is used ^使用“ Multiline行”选项时,匹配行的开头

$ Matches the end of the row when the Multiline option is used $使用“ Multiline行”选项时匹配行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM