简体   繁体   English

什么是从c#中的字符串中提取5位数字的最快方法

[英]what's the quickest way to extract a 5 digit number from a string in c#

what's the quickest way to extract a 5 digit number from a string in c#. 什么是从c#中的字符串中提取5位数的最快方法。

I've got 我有

string.Join(null, System.Text.RegularExpressions.Regex.Split(expression, "[^\\d]"));

Any others? 还有其他人?

The regex approach is probably the quickest to implement but not the quickest to run. 正则表达式方法可能是最快的实现,但不是最快的运行方式。 I compared a simple regex solution to the following manual search code and found that the manual search code is ~2x-2.5x faster for large input strings and up to 4x faster for small strings: 我将一个简单的正则表达式解决方案与下面的手动搜索代码进行了比较,发现对于大型输入字符串,手动搜索代码的速度提高约2x-2.5x,对于小字符串,速度提高4倍:

static string Search(string expression)
{
  int run = 0;
  for (int i = 0; i < expression.Length; i++)
  {
    char c = expression[i];
    if (Char.IsDigit(c))
      run++;
    else if (run == 5)
      return expression.Substring(i - run, run);
    else
      run = 0;
  }
  return null;
}
const string pattern = @"\d{5}";
static string NotCached(string expression)
{
  return Regex.Match(expression, pattern, RegexOptions.Compiled).Value;
}

static Regex regex = new Regex(pattern, RegexOptions.Compiled);
static string Cached(string expression)
{
  return regex.Match(expression).Value;
}

Results for a ~50-char string with a 5-digit string in the middle, over 10^6 iterations, latency per call in microseconds (smaller number is faster): 结果为~50-char字符串,中间有5位数字符串,超过10 ^ 6次迭代,每次调用的延迟(以微秒为单位)(较小的数字更快):

Simple search: 0.648396us 简单搜索:0.648396us

Cached Regex: 2.1414645us 缓存正则表达式:2.1414645us

Non-cached Regex: 3.070116us 非缓存正则表达式:3.070116us

Results for a ~40K string with a 5-digit string in the middle over 10^4 iterations, latency per call in microseconds (smaller number is faster): 结果是一个~40K字符串,中间有一个5位数字符串,超过10 ^ 4次迭代,每次调用的延迟时间以微秒为单位(较小的数字更快):

Simple search: 423.801us 简单搜索:423.801us

Cached Regex: 1155.3948us 缓存正则表达式:1155.3948us

Non-cached Regex: 1220.625us 非缓存正则表达式:1220.625us

A little surprising: I would have expected Regex -- which is compiled to IL -- to be comparable to the manual search, at least for very large strings. 有点令人惊讶:我原本期望Regex(编译为IL)与手动搜索相当,至少对于非常大的字符串。

Use a regular expression (\\d{5}) to find the occurrence(s) of the 5 digit number in the string and use int.Parse or decimal.Parse on the match(s). 使用正则表达式(\\ d {5})查找字符串中5位数字的出现次数,并在匹配项上使用int.Parse或decimal.Parse。

In the case where there is only one number in text . text只有一个数字的情况下。

int? value = null;
string pat = @"\d{5}"
Regex r = new Regex(pat);
Match m = r.Match(text);
if (m.Success)
{
   value = int.Parse(m.Value);
}

Do you mean convert a string to a number? 你的意思是将字符串转换为数字吗? Or find the first 5 digit string and then make it a number? 或找到前5位数字符串,然后将其作为数字? Either way, you'll probably be using decimal.Parse or int.Parse. 不管怎样,你可能会使用decimal.Parse或int.Parse。

I'm of the opinion that Regular Expressions are the wrong approach. 我认为正则表达式是错误的方法。 A more efficient approach would simply to walk through the string looking for a digit, and then advancing 4 characters and seeing if they are all digits. 一种更有效的方法是简单地遍历字符串寻找数字,然后前进4个字符并查看它们是否都是数字。 If they are, you've got your substring. 如果是,你就得到了你的子串。 It's not as robust, no, but it doesn't have the overhead either. 它不那么强大,不,但它也没有开销。

Don't use a regular expression at all. 根本不要使用正则表达式。 It's way more powerful than you need - and that power is likely to hit performance. 它比你需要的更强大 - 而且这种力量可能会影响性能。

If you can give more details of what you need it to do, we can write the appropriate code... (Test cases would be ideal.) 如果你能提供你需要做的更多细节,我们可以编写适当的代码......(测试用例是理想的。)

If the numbers exist with other characters regular expressions are a good solution. 如果数字与其他字符存在,正则表达式是一个很好的解决方案。

EG: ([0-9]{5}) EG:([0-9] {5})

will match - asdfkki12345afdkjsdl, 12345adfaksk, or akdkfa12345 将匹配 - asdfkki12345afdkjsdl,12345adfaksk或akdkfa12345

If you have a simple test case like "12345" or even "12345abcd" don't use regex at all. 如果您有一个简单的测试用例,如“12345”甚至“12345abcd”,请不要使用正则表达式。 They are not known by they speed. 他们的速度并不为人所知。

For most strings a brute force method is going to be quicker than a RegEx. 对于大多数字符串,强力方法比RegEx更快。

A fairly noddy example would be: 一个相当简单的例子是:

string strIWantNumFrom = "qweqwe23qeeq3eqqew9qwer0q";

int num = int.Parse(
    string.Join( null, (
        from c in strIWantNumFrom.ToCharArray()
        where c == '1' || c == '2' || c == '3' || c == '4' || c == '5' ||
            c == '6' || c == '7' || c == '8' || c == '9' || c == '0'
        select c.ToString()
    ).ToArray() ) );

No doubt there are much quicker ways, and lots of optimisations that depend on the exact format of your string. 毫无疑问,有更快的方法,以及许多优化取决于字符串的确切格式。

This might be faster... 这可能会更快......

public static string DigitsOnly(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }

if you want to limit it to at most five digits, then 如果你想将它限制在最多五位数,那么

public static string DigitsOnly(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0 && i < 5)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM