简体   繁体   中英

Remove chars from string

I have a string like

string Text = "012345678901234567890123456789";

and a List<int> with indexes

List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

with following restrictions

  • there are duplicates within the list
  • the list is not sorted
  • there may be indexes > Text.length

what's the best way to remove characters from the text which are in the index list?

expected output:

035681234679012456789

Is there a more efficent way than

foreach (int index in Indexes
                        .OrderByDescending(x => x)
                        .Distinct()
                        .Where(x => x < Text.Length))
{
    Text = Text.Remove(index, 1);
}

Update : Here are the Benchmarks of the current answers ( string with 100.000 characters and List<int> with length 10.000:

Gallant: 3.322 ticks
Tim Schmelter: 8.602.576 ticks
Sergei Zinovyev: 9.002 ticks
rbaghbanli: 7.137 ticks
Jirí Tesil Tesarík: 72.580 ticks

Here's a more or less elegant LINQ way:

Text = new string(Text.Where((c, index) => !Indexes.Contains(index)).ToArray());

It uses the overload of Enumerable.Where that projects the index of the item in the sequence.

If you want the most efficient and not the most readable way and the text is really large you could use a HashSet<int> instead of the list which doesn't allow duplicates and a StringBuilder to create the new string:

var indexSet = new HashSet<int>(Indexes); // either create from the list(as shown here) or use it without your list
var textBuilder = new StringBuilder(Text.Length);

for(int i = 0; i < Text.Length; i++)
    if (!indexSet.Contains(i))
        textBuilder.Append(Text[i]);
Text = textBuilder.ToString();

Of course you could also use the HashSet<int> in the LINQ approach to make it more efficient.

This will work faster:

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

HashSet<int> hashSet = new HashSet<int>(Indexes);

StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; ++i)
{
    if (!hashSet.Contains(i))
    {
        sb.Append(Text[i]);
    }
}

string str = sb.ToString();

Yes, see code below (it will iterate only once over each sequence):

var map = new short[Text.Length];
foreach (var i in Indexes)
{
    if (i < text.Count)
        map[i] = 1;
}
Text = new string(Text.Where((c, i) => map[i] == 0).ToArray());

The following is making an assumption that your string contains a known set of characters. If you know for certain that, for example, Unicode character never appears in the string, you can use it as a placeholder to mark which characters to remove. This should be very fast in exchange for this limitation:

char temp = '\uFFF0';
StringBuilder sb = new StringBuilder(Text);
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < sb.Length)
    {
        sb[Indexes[i]] = temp;
    }
}

Text = sb.Replace(temp.ToString(), null).ToString();

This appears to be anywhere between 3-4 times faster than building a HashSet like some other answers have suggested. http://ideone.com/mUILHg


If you cannot make the above assumption, you can build an array to contain this extra bit of data instead of using a unique character. This does two rounds of iteration (so it's a bit slower), but it is still O(n) efficiency (so it should typically be faster than putting Indexes into a hashmap before iterating).

bool[] exclude = new bool[Text.Length];
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < exclude.Length)
    {
        exclude[Indexes[i]] = true;
    }
}
StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; i++)
{
    if (!exclude[i])
    {
        sb.Append(Text[i]);
    }
}
Text = sb.ToString();

Quick benchmarks: http://ideone.com/3d2uPH

A modified solution using byte (may be replaced by boolean) array instead of hash table. PROS: linear complexity, CONS: needs extra memory for flag array.

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
byte[] contains = new byte[Text.Length];
Indexes.ForEach(p=> {if ( p<Text.Length) contains[p]=1;});
var output = string.Concat(Enumerable.Range(0, Text.Length).Where(p => contains[p] != 1).Select(p => Text[p]));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM