简体   繁体   English

使用Regex.Split创建字符串数组

[英]Creating string array using Regex.Split

Alright, I'm warning you in advance, my understanding of Regular Expressions is extremely limited (I've tried my best to learn them over the years, but to be honest, I think they just frighten me.) 好吧,我会提前警告您,我对正则表达式的理解非常有限(这些年来,我尽了最大的努力来学习它们,但老实说,我认为它们使我感到恐惧。)

Let's say I have the following string: 假设我有以下字符串:

string keyValues = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f"

This string represents key-value pairs, delimited by a user-defined string (in this case || ) (eg key1=value1||key2=value2 ). 该字符串表示键值对,由用户定义的字符串(在这种情况下|| )分隔(例如key1=value1||key2=value2 )。 I am trying to extract the keys out of this string and store them in an array. 我试图从此字符串中提取密钥,并将其存储在数组中。 That array would look like this: 该数组如下所示:

{"CustomerId", "OrderId", "UserId"}

The best option I can think of is to use regular expressions (If someone has a better solution, please share). 我能想到的最好的选择是使用正则表达式(如果有人有更好的解决方案,请分享)。 Here's what I'm trying to do: 这是我想做的事情:

string delimiter = "||";
string[] keys = Regex.Split(keyValues, "=.*" + delimiter);

I may be wrong, but the way I understand it, that regular expression is supposed to find a string that starts with = and ends with delimiter , with any number of any characters in between. 我可能是错的,但据我所知,正则表达式应该找到一个以=开头,以delimiter结束,且之间有任意数量的字符的字符串。 Which would split the string at those positions, leaving me with the original keys, but instead, my keys array looks like this: 它将在这些位置拆分字符串,使我拥有原始键,但是,我的键数组看起来像这样:

{"", "C", "u", "s", "t", "o", "m", "e", "r", "I", "d", "", "", ...}

As you can see, the =value|| 如您所见, =value|| part is stripped away. 部分被剥离。 Can anyone tell me what I'm doing wrong? 谁能告诉我我在做什么错?

EDIT 编辑

In my case, the delimiter || 就我而言,定界符|| is a variable. 是一个变量。 I didn't mention this only because I thought I would be able to replace any references to || 我之所以没有提及它,只是因为我认为我可以替换对||引用。 with delimiter . delimiter From the majority of the answers given, I now see that that is an important detail. 从给出的大多数答案中,我现在看到这是一个重要的细节。

| has special meaning in regular expression ( patA|patB matches either patA or patB ). 在正则表达式中有特殊含义( patA|patB匹配patApatB )。 Escape | 逃生| .

Using non-greedy match ( .*? ): 使用非贪心比对( .*? ):

string delimiter = "||";
string[] keys = Regex.Split(keyValues, @"=.*?" + Regex.Escape(delimiter));

This will give you {"CustomerId", "OrderId", "UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f"} . 这将为您提供{"CustomerId", "OrderId", "UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f"}

Matches with lookahead assertion is more appropriate: 与先行断言Matches更合适:

string delimiter = "||";
string keyValues = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f";
string pattern = @"(?<=^|" + Regex.Escape(delimiter) + @")\w+(?==)";
var keys = Regex.Matches(keyValues, pattern);

BTW, use verbatim string literals ( @"verbatim string literal" ) when express regular expression. 顺便说一句,在表达正则表达式时,请使用逐字字符串文字( @"verbatim string literal" )。

Demo 演示版

If you just care for the keys, why not try to use a match instead of a split using: 如果您只关心键,为什么不尝试使用匹配而不是使用以下方法进行拆分:

@"[^=|]+(?==)"

If the key can't contain an equal sign = or a vertical bar | 如果键不能包含等号=或竖线| , then the above expression will match one ore more characters that are not = or | ,则上面的表达式将匹配一个或多个非=|字符 which are followed by an equal sign = , thus matching the keys. 其后跟等号= ,从而匹配键。

In C#: 在C#中:

var input = "CustomerId=1||OrderId=12||UserId=a1dcd568-f129-419b-b51e-be2dbb67de0f";
var results = Regex.Matches(input, @"[^=|]+(?==)");

An alternative is to do this without a regular expression, as the string operations are pretty basic: 另一种选择是在不使用正则表达式的情况下执行此操作,因为字符串操作非常基础:

string[] keys =
  keyValues.Split(new string[]{"||"}, StringSplitOptions.None)
  .Select(s => s.Substring(0, s.IndexOf('='))).ToArray();

Keep the regular expressions to the advanced string operations. 保留正则表达式为高级字符串操作。 :) :)

(When testing the performance of this solution compared to using a regular expression, this showed to be about 40 times faster.) (与使用正则表达式相比,测试此解决方案的性能时,它显示出快约40倍。)

Split on @"=[^|]*(?:\\|\\||$)" 分割成@"=[^|]*(?:\\|\\||$)"
If you need more assurance, use @"=[^=|]*(?:\\|\\||$)" 如果需要更多保证,请使用@"=[^=|]*(?:\\|\\||$)"

Edited to consume end where no delimeter exists. 编辑为消耗不存在定界符的末尾。
Try to just use no-blank elements if its in C#. 如果在C#中尝试仅使用无空格元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM