使用释放字符和分隔符将正则表达式拆分为字符串

Question

I need to parse an EDI file, where the separators are + , : and ' signs and the escape (release) character is ? 我需要解析一个EDI文件，其中的分隔符+ ， :和'标志和逃逸（释放）性格? . 。 You first split into segments 你首先分成几个部分

var data = "NAD+UC+ABC2378::92++XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 71+Duzce+Seferihisar / IZMIR++35460+TR"

var segments = data.Split('\'');

then each segment is split into segment data elements by + , then segment data elements are split into component data elements via : . 然后通过+将每个段拆分成段数据元素，然后通过以下方式将段数据元素拆分为组件数据元素: 。

var dataElements = segments[0].Split('+');

the above sample string is not parsed correctly because of the use of release character. 由于使用了释放字符，因此无法正确解析上面的示例字符串。 I have special code dealing with this, but I am thinking that this should be all doable using 我有特殊的代码处理这个问题，但我认为这应该是可行的

Regex.Split(data, separator);

I am not familiar with Regex'es and could not find a way to do this so far. 我对Regex'es不熟悉，到目前为止找不到办法。 The best I came up so far is 我到目前为止最好的是

string[] lines = Regex.Split(data, @"[^?]\+");

which omits the character before + sign. 省略+符号前的字符。

NA
U
ABC2378::9
+XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzc
Seferihisar / IZMI
+3546
TR

Correct Result Should be: 正确的结果应该是：

NAD
UC
ABC2378::92

XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzce
Seferihisar / IZMIR
35460
TR

So the question is this doable with Regex.Split, and what should the regex separator look like. 所以问题是Regex.Split可行，并且正则表达式分隔符应该是什么样的。

Answer 1

I can see that you want to split around plus signs + only if they are not preceded (escaped) by a question mark ? 我可以看到你想要分开加号+只有当它们没有被问号前面（逃脱）时 ? . 。 This can be done using the following: 这可以使用以下方法完成：

(?<!\?)\+

This matches one or more + signs if they are not preceded by a question mark ? 如果它们之前没有问号，则匹配一个或多个+符号? . 。

Edit: The problem or bug with the previous expression if that it doesn't handle situations like ??+ or ???+ or or ????+ , in other words it doesn't handle situations where ? 编辑：上一个表达式的问题或错误，如果它不处理像??+或???+或或????+ ，换句话说它不处理的情况? s are used to escape themselves. s习惯于逃避自己。

We can solve this problem by noticing that if there is an odd number of ? 我们可以通过注意到如果有奇数?来解决这个问题? preceding a + then the last one is definitely escaping the + so we must not split, but if there is an even number of ? 在一个+然后最后一个肯定是逃避+所以我们不能拆分，但如果有一个偶数? before a plus then those cancel out each leaving the + so we should split around it. 在一个加号然后那些取消每个离开+所以我们应该分开它。

From the previous observation we should come up with an expression that matches a + only if it is preceded by an even number of question marks ? 从前面的观察中我们应该得出一个只有在 +前面有偶数个问号的表达式? , and here it is: ，这里是：

(?<!(^|[^?])(\?\?)*\?)\+

Answer 2

string[] lines = Regex.Split(data, @"\+");

would it meet the requirement?? 它会满足要求吗？

Here is the edit for escaping the '?' 这是逃避'？'的编辑 before '+'. 在'+'之前。

string[] lines = Regex.Split(data, @"(?<!\?)[\+]+");

The '+' end the end would match multiple consecutive occurances of seperator '+'. 结尾的“+”结束将匹配分隔符“+”的多个连续出现。 If you want white spaces instead. 如果你想要白色空格。

string[] lines = Regex.Split(data, @"(?<!\?)[\+]");

使用释放字符和分隔符将正则表达式拆分为字符串

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-08-26 12:09:38

解决方案2
1 2013-08-26 12:05:47

使用释放字符和分隔符将正则表达式拆分为字符串

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-08-26 12:09:38

解决方案2 1 2013-08-26 12:05:47

解决方案1
4 已采纳 2013-08-26 12:09:38

解决方案2
1 2013-08-26 12:05:47