简体   繁体   English

使用正则表达式从字符串中提取 substring

[英]Extract substring from string with Regex

Imagine that users are inserting strings in several computers.想象一下,用户正在多台计算机中插入字符串。

On one computer, the pattern in the configuration will extract some characters of that string, lets say position 4 to 5. On another computer, the extract pattern will return other characters, for instance, last 3 positions of the string.在一台计算机上,配置中的模式将提取该字符串的一些字符,例如 position 4 到 5。在另一台计算机上,提取模式将返回其他字符,例如字符串的最后 3 个位置。

These configurations (the Regex patterns) are different for each computer, and should be available for change by the administrator, without having to change the source code.这些配置(Regex 模式)对于每台计算机都是不同的,管理员应该可以更改,而无需更改源代码。

Some examples:一些例子:

         Original_String       Return_Value
User1 -  abcd78defg123         78
User2 -  abcd78defg123         78g1
User3 -  mm127788abcd          12
User4 -  123456pp12asd         ppsd

Can it be done with Regex?可以用正则表达式完成吗? Thanks.谢谢。

Why do you want to use regex for this?为什么要为此使用正则表达式? What is wrong with:出什么问题了:

string foo = s.Substring(4,2);
string bar = s.Substring(s.Length-3,3);

(you can wrap those up to do a bit of bounds-checking on the length easily enough) (您可以将它们包装起来以轻松地对长度进行一些边界检查)

If you really want, you could wrap it up in a Func<string,string> to put somewhere - not sure I'd bother, though:如果你真的想要,你可以把它包装在一个Func<string,string>放在某个地方 - 不过我不确定我会打扰:

Func<string, string> get4and5 = s => s.Substring(4, 2);
Func<string,string> getLast3 = s => s.Substring(s.Length - 3, 3);
string value = "abcd78defg123";
string foo = getLast3(value);
string bar = get4and5(value);

If you really want to use regex:如果你真的想使用正则表达式:

^...(..)

And:和:

.*(...)$

I'm not sure what you are hoping to get by using RegEx.我不确定您希望通过使用 RegEx 获得什么。 RegEx is used for pattern matching. RegEx 用于模式匹配。 If you want to extract based on position, just use substring.如果要基于 position 提取,只需使用 substring。

It seems to me that Regex really isn't the solution here.在我看来,Regex 真的不是这里的解决方案。 To return a section of a string beginning at position pos (starting at 0) and of length length , you simply call the Substring function as such:要返回从 position pos (从 0 开始)且长度为length的字符串部分,您只需调用 Substring function 如下:

string section = str.Substring(pos, length)

Grouping.分组。 You could match on /^.{3}(.{2})/ and then look at group $1 for example.您可以在 /^.{3}(.{2})/ 上进行匹配,然后查看 $1 组。

The question is why?问题是为什么? Normal string handling ie actual substring methods are going to be faster and clearer in intent.正常的字符串处理,即实际的 substring 方法将更快、更清晰。

To have a regex capture values for further use you typically use (), depending on the regex compiler it might be () or for microsoft MSVC I think it's []要让正则表达式捕获值以供进一步使用,您通常使用 (),这取决于正则表达式编译器它可能是 () 或对于 microsoft MSVC,我认为它是 []

Example例子

User4 -  123456pp12asd         ppsd  

is most interesting in that you have here 2 seperate capture areas.最有趣的是,这里有 2 个单独的捕获区域。 Is there some default rule on how to join them together, or would you then want to be able to specify how to make the result?是否有一些关于如何将它们连接在一起的默认规则,或者您是否希望能够指定如何生成结果?

Perhaps something like也许像

r/......(..)...(..)/\1\2/  for ppsd
r/......(..)...(..)/\2-\1/ for sd-pp

do you want to run a regex to get the captures and handle them yourself, or do you want to run more advanced manipulation commands?你想运行一个正则表达式来获取捕获并自己处理它们,还是你想运行更高级的操作命令?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM