[英]split string with regex using a release character and separators
I need to parse an EDI file, where the separators are +
, :
and '
signs and the escape (release) character is ?
我需要解析一个EDI文件,其中的分隔符+
, :
和'
标志和逃逸(释放)性格?
. 。 You first split into segments 你首先分成几个部分
var data = "NAD+UC+ABC2378::92++XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 71+Duzce+Seferihisar / IZMIR++35460+TR"
var segments = data.Split('\'');
then each segment is split into segment data elements by +
, then segment data elements are split into component data elements via :
. 然后通过+
将每个段拆分成段数据元素,然后通过以下方式将段数据元素拆分为组件数据元素:
。
var dataElements = segments[0].Split('+');
the above sample string is not parsed correctly because of the use of release character. 由于使用了释放字符,因此无法正确解析上面的示例字符串。 I have special code dealing with this, but I am thinking that this should be all doable using 我有特殊的代码处理这个问题,但我认为这应该是可行的
Regex.Split(data, separator);
I am not familiar with Regex'es and could not find a way to do this so far. 我对Regex'es不熟悉,到目前为止找不到办法。 The best I came up so far is 我到目前为止最好的是
string[] lines = Regex.Split(data, @"[^?]\+");
which omits the character before +
sign. 省略+
符号前的字符。
NA
U
ABC2378::9
+XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzc
Seferihisar / IZMI
+3546
TR
Correct Result Should be: 正确的结果应该是:
NAD
UC
ABC2378::92
XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzce
Seferihisar / IZMIR
35460
TR
So the question is this doable with Regex.Split, and what should the regex separator look like. 所以问题是Regex.Split可行,并且正则表达式分隔符应该是什么样的。
I can see that you want to split around plus signs +
only if they are not preceded (escaped) by a question mark ?
我可以看到你想要分开加号+
只有当它们没有被问号前面(逃脱)时 ?
. 。 This can be done using the following: 这可以使用以下方法完成:
(?<!\?)\+
This matches one or more +
signs if they are not preceded by a question mark ?
如果它们之前没有问号,则匹配一个或多个+
符号?
. 。
Edit: The problem or bug with the previous expression if that it doesn't handle situations like ??+
or ???+
or or ????+
, in other words it doesn't handle situations where ?
编辑:上一个表达式的问题或错误,如果它不处理像??+
或???+
或或????+
,换句话说它不处理的情况?
s are used to escape themselves. s习惯于逃避自己。
We can solve this problem by noticing that if there is an odd number of ?
我们可以通过注意到如果有奇数?
来解决这个问题?
preceding a +
then the last one is definitely escaping the +
so we must not split, but if there is an even number of ?
在一个+
然后最后一个肯定是逃避+
所以我们不能拆分,但如果有一个偶数?
before a plus then those cancel out each leaving the +
so we should split around it. 在一个加号然后那些取消每个离开+
所以我们应该分开它。
From the previous observation we should come up with an expression that matches a +
only if it is preceded by an even number of question marks ?
从前面的观察中我们应该得出一个只有在 +
前面有偶数个问号的表达式?
, and here it is: ,这里是:
(?<!(^|[^?])(\?\?)*\?)\+
string[] lines = Regex.Split(data, @"\+");
would it meet the requirement?? 它会满足要求吗?
Here is the edit for escaping the '?' 这是逃避'?'的编辑 before '+'. 在'+'之前。
string[] lines = Regex.Split(data, @"(?<!\?)[\+]+");
The '+' end the end would match multiple consecutive occurances of seperator '+'. 结尾的“+”结束将匹配分隔符“+”的多个连续出现。 If you want white spaces instead. 如果你想要白色空格。
string[] lines = Regex.Split(data, @"(?<!\?)[\+]");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.