[英]Find all but the first occurrence of a character with REGEX
I'm building a .Net application and I need to strip any non-decimal character from a string (excluding the first '.'). 我正在构建.Net应用程序,我需要从字符串中删除任何非小数字符(不包括第一个'。')。 Essentially I'm cleaning user input to force a real number result. 基本上我正在清理用户输入以强制实数结果。
So far I've been using online RegEx tools to try and achieve this in a single pass, but I'm not getting very far. 到目前为止,我一直在使用在线RegEx工具试图在一次通过中实现这一点,但我没有走得太远。
I wish to accomplish this: 我希望这样做:
asd123.asd123.123.123 = 123.123123123
Unfortunately I've only managed to get to the stage where 不幸的是,我只是设法进入了舞台
asd123.asd123.123.123 = 123.123.123.123
by using this code. 通过使用此代码。
System.Text.RegularExpressions.Regex.Replace(str, "[^\.|\d]*", "")
But I am stuck trying to remove all but the first decimal-point. 但是我试图删除除第一个小数点以外的所有内容。
Can this be done in a single pass? 这可以一次完成吗?
Is there a better-way™? 有没有更好的方式?
This can be done in a single regex, at least in .NET which supports infinite repetition inside lookbehind assertions : 这可以在单个正则表达式中完成,至少在.NET中支持lookbehind断言中的无限重复:
resultString = Regex.Replace(subjectString, @"(?<!^[^.]*)\.|[^\d.]", "");
Explanation: 说明:
(?<!^[^.]*) # Either match (as long as there is at least one dot before it)
\. # a dot
| # or
[^\d.] # any characters except digits or dots.
(?<!^[^.]*)
means: Assert that it's impossible to match a string that starts at the beginning of the input string and consists solely of characters other than dots. (?<!^[^.]*)
表示:断言不可能匹配从输入字符串开头开始的字符串,并且只包含点以外的字符。 This condition is true for all dots following the first one. 对于第一个点之后的所有点,这种情况都适用。
I think it'll be done better without regular expressions. 我认为如果没有正则表达式,它会更好。
string str = "asd123.asd123.123.123";
StringBuilder sb = new StringBuilder();
bool dotFound = false;
foreach (var character in str)
{
if (Char.IsDigit(character))
sb.Append(character);
else if (character == '.')
if (!dotFound)
{
dotFound = true;
sb.Append(character);
}
}
Console.WriteLine(sb.ToString());
Firstly, the regex you are currently using will leave any | 首先,您当前使用的正则表达式将留下任何| characters untouched. 字符不受影响。 You only need [^.\\d]*
since .
你只需要[^.\\d]*
.
has no special meaning in []
在[]
没有特殊意义
After this replace, you could try something like this: 在替换之后,您可以尝试这样的事情:
Replace(str, "([\d]+\.[\d]+)[^\d].*", "\1");
But you'd only need this if there is a .
但是如果有的话,你只需要这个.
at all in the number. 根本就是这个数字。
Hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.