簡體   English   中英

從String中刪除除字母表之外的所有內容

[英]Remove everything but alphabets from an String

我想以有效的方式從給定的字符串中刪除任何字符,但字母。 有什么建議嗎?

var result = str.Where(c => char.IsLetter(c));

我對@ KirillPolishchuk的答案非常感興趣所以我剛用LINQPad做了一個小基准,使用隨機構建的字符串,這里是完整的代碼(雖然它返回了一個IEnumerable,我不得不略微更改我的原始代碼):

void Main()
{
    TimeSpan elapsed;
    string result;

    elapsed = TheLINQWay(buildString(1000000), out result);
    Console.WriteLine("LINQ way: {0}", elapsed);

    elapsed = TheRegExWay(buildString(1000000), out result);
    Console.WriteLine("RegEx way: {0}", elapsed);
}

TimeSpan TheRegExWay(string s, out string result)
{
    Stopwatch stopw = new Stopwatch();

    stopw.Start();
    result = Regex.Replace(s, @"\P{L}", string.Empty);
    stopw.Stop();

    return stopw.Elapsed;
}

TimeSpan TheLINQWay(string s, out string result)
{
    Stopwatch stopw = new Stopwatch();

    stopw.Start();
    result = new string(s.Where(c => char.IsLetter(c)).ToArray());
    stopw.Stop();

    return stopw.Elapsed;
}

string buildString(int len)
{
    byte[] buffer = new byte[len];
    Random r = new Random((int)DateTime.Now.Ticks);

    for(int i = 0; i < len; i++)
        buffer[i] = (byte)r.Next(256);

    return Encoding.ASCII.GetString(buffer);
}

這是結果:

LINQ way: 00:00:00.0150030
RegEx way: 00:00:00.2788130

但仍然需要說一句話:正如Servy在他的評論中指出的那樣,正則表達式更短,字符串更短。

采用:

var result = Regex.Replace(input, @"\P{L}", string.Empty);

我能想到的最有效的方式:

string input = "ABCD 13 ~";

// at worst, all characters are alphabetical, so we have to accommodate for that
char[] output = new char[input.Length];

int numberOfAlphabeticals = 0;
for (int i = 0; i < input.Length; i++)
{
    char character = input[i];
    var charCode = (byte) character;

    // based on ASCII 
    if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
    {
        output[numberOfAlphabeticals ] = character;
        ++numberOfAlphabeticals ;
    }
}

string outputAsString = new string(output, 0, numberOfAlphabeticals );

我認為這是創建122個字符數組的最快方法(性能方面),將選擇的字符串轉換為字節數組,並使用StringBuilder構建另一個字符串,其中刪除了字符:

private static char[] alphabet = {'\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '\0', '\0', '\0', '\0', '\0', '\0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',};

這里是刪除功能(沒有編譯它,但它應該給你的想法):

string RemoveNonAlpha(string value)
{
    byte[] asciiBytes = Encoding.ASCII.GetBytes(value);
    StringBuilder sb = new StringBuilder();
    for(int i = 0; i < asciiBytes.Length; i++)
    {
        if((asciiBytes[i] >= 65 && asciiBytes[i] <= 90) || (asciiBytes[i] >= 97 && asciiBytes[i] <= 122))
        {
            sb.Append(alphabet[asciiBytes[i]]);
        }
    }

    return sb.ToString();
}

更新

根據Nikola的回答 ,這是一個改進的版本:

private static string RemoveNonAlpha(string value)
{
    char[] output = new char[value.Length];
    int numAlpha = 0;
    byte charCode = 0;
    for (int i = 0; i < value.Length; i++)
    {
        charCode = (byte)value[i];
        if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
        {
            output[numAlpha] = value[i];
            numAlpha++;
        }
    }

    return new string(output, 0, numAlpha);
}

以下是使用LINQ的結果:

The LINQ way 100: 6.7935
The fast way 100: 0.4648
The LINQ way 1000: 0.0442
The fast way 1000: 0.0134
The LINQ way 10000: 0.2078
The fast way 10000: 0.143
The LINQ way 100000: 2.0617
The fast way 100000: 1.3864

采用

^ \\ W

作為正則表達式的替換方法的輸入

http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM