简体   繁体   中英

What is the most elegant way to get a set of items by index from a collection?

Given

IList<int> indexes;
ICollection<T> collection;

What is the most elegant way to extract all T in collection based on the the indexes provided in indexes ?

For example, if collection contained

"Brian", "Cleveland", "Joe", "Glenn", "Mort"

And indexes contained

1, 3

The return would be

"Cleveland," "Glenn"

Edit: Assume that indexes is always sorted ascending.

This assumes that the index sequence is a monotone ascending sequence of non-negative indices. The strategy is straightforward: for each index, bump up an enumerator on the collection to that point and yield the element.

public static IEnumerable<T> GetIndexedItems<T>(this IEnumerable<T> collection, IEnumerable<int> indices)
{
    int currentIndex = -1;
    using (var collectionEnum = collection.GetEnumerator())
    {
        foreach(int index in indices)
        {
            while (collectionEnum.MoveNext()) 
            {
                currentIndex += 1;
                if (currentIndex == index)
                {
                    yield return collectionEnum.Current;
                    break;
                }
            }
        }    
    }
}

Advantages of this solution over other solutions posted:

  • O(1) in extra storage -- some of these solutions are O(n) in space
  • O(n) in time -- some of these solutions are quadradic in time
  • works on any two sequences; does not require ICollection or IList.
  • only iterates the collection once; some solutions iterate the collection multiple times (to build a list out of it, for instance.)

Disadvantages:

  • harder to read

Here's a faster version:

IEnumerable<T> ByIndices<T>(ICollection<T> data, IList<int> indices)
{
    int current = 0;
    foreach(var datum in data.Select((x, i) => new { Value = x, Index = i }))
    {
        if(datum.Index == indices[current])
        {
            yield return datum.Value;
            if(++current == indices.Count)
                yield break;
        }
    }
}

Not sure how elegant this is, but here you go.

Since ICollection<> doesn't give you indexing I just used IEnumerable<> , and since I didn't need the index on the IList<> I used IEnumerable<> there too.

public static IEnumerable<T> IndexedLookup<T>(
    IEnumerable<int> indexes, IEnumerable<T> items)
{
    using (var indexesEnum = indexes.GetEnumerator())
    using (var itemsEnum = items.GetEnumerator())
    {
        int currentIndex = -1;
        while (indexesEnum.MoveNext())
        {
            while (currentIndex != indexesEnum.Current)
            {
                if (!itemsEnum.MoveNext())
                    yield break;
                currentIndex++;
            }

            yield return itemsEnum.Current;
        }
    }
}

EDIT: Just noticed my solution is similar to Erics.

I would use an extension Method

public static IEnumerable<T> Filter<T>(this IEnumerable<T> pSeq, 
                                       params int [] pIndexes)
{
      return pSeq.Where((pArg, pId) => pIndexes.Contains(pId));
}

You could do it in an extension method:

static IEnumerable<T> Extract<T>(this ICollection<T> collection, IList<int> indexes)
{
   int index = 0;
   foreach(var item in collection)
   {
     if (indexes.Contains(index))
       yield item;
     index++;
   }
}

Not elegant, but efficient - make sure indexes are sorted ...

ICollection<T> selected = new Collection<T>();
var indexesIndex = 0;
var collectionIndex = 0;
foreach( var item in collection )
{
    if( indexes[indexesIndex] != collectionIndex++ )
    {
        continue;
    }
    selected.Add( item );
    if( ++indexesIndex == indexes.Count )
    {
        break;
    }
}

As a proper answer :

var col = new []{"a","b","c"};
var ints = new []{0,2};
var set = new HashSet<int>(ints);

var result = col.Where((item,index) => set.Contains(index));

A usual with IList.Contains or Enumerable.Contains, don't do lookups in lists if you don't know how many indexes there will be in the collection. Or you'll go the O(n^2) way the hard way. If you want to be on the safe side, you should use a intermediary Lookup/Dictionary/Hashset and test on this collection and not on the vanilla list (linear search is not good for you)

Several good suggestions here already, I'll just throw in my two cents.

int counter = 0;
var x = collection
    .Where((item, index) => 
        counter < indices.Length && 
        index == indices[counter] && 
        ++counter != 0);

edit: yah, didn't think it through the first time around. the increment has to happen only when the other two conditions are satisfied..

I find this solution particualarly elegant and a bit easier to follow.

Solution 1

   public static IEnumerable<T> GetIndexedItems2<T>(this IEnumerable<T> collection,    IEnumerable<int> indices) {

        int skipped = 0;
        foreach (int index in indices) {
            int offset = index - skipped;
            collection = collection.Skip(offset);
            skipped += offset;
            yield return collection.First();
        }
    }

This can be refactored farther to a real simple implementation:

Solution 2

   public static IEnumerable<T> GetIndexedItems3<T>(this IEnumerable<T> collection, IEnumerable<int> indices) {
        foreach (int offset in indices.Distances()) {
            collection = collection.Skip(offset);
            yield return collection.First();
        }
    }

    public static IEnumerable<int> Distances(this IEnumerable<int> numbers) {
        int offset = 0;
        foreach (var number in numbers) {
            yield return number - offset;
            offset = number;
        }
    }

But we are not done

Due to deferred execution LINQs Skip is way too slow.

   public static IEnumerable<T> GetIndexedItems4<T>(this IEnumerable<T> collection, IEnumerable<int> indices) {
        var rest = collection.GetEnumerator();
        foreach (int offset in indices.Distances()) {
            Skip(rest, offset);
            yield return rest.Current;
        }
    }

    static void Skip<T>(IEnumerator<T> enumerator, int skip) {
        while (skip > 0) {
            enumerator.MoveNext();
            skip--;
        }
        return;
    }

    static IEnumerable<int> Distances(this IEnumerable<int> numbers) {
        int offset = 0;
        foreach (var number in numbers) {
            yield return number - offset;
            offset = number;
        }
    }

Benchmarking, gives us similar performance to the solution by Eric.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;

namespace ConsoleApplication21 {

    static class LinqExtensions {

        public static IEnumerable<T> GetIndexedItemsEric<T>(this IEnumerable<T> collection, IEnumerable<int> indices) {
            int currentIndex = -1;
            using (var collectionEnum = collection.GetEnumerator()) {
                foreach (int index in indices) {
                    while (collectionEnum.MoveNext()) {
                        currentIndex += 1;
                        if (currentIndex == index) {
                            yield return collectionEnum.Current;
                            break;
                        }
                    }
                }
            }
        }

        public static IEnumerable<T> GetIndexedItemsSam<T>(this IEnumerable<T> collection, IEnumerable<int> indices) {
            var rest = collection.GetEnumerator();
            foreach (int offset in indices.Distances()) {
                Skip(rest, offset);
                yield return rest.Current;
            }
        }

        static void Skip<T>(this IEnumerator<T> enumerator, int skip) {
            while (skip > 0) {
                enumerator.MoveNext();
                skip--;
            }
            return;
        }

        static IEnumerable<int> Distances(this IEnumerable<int> numbers) {
            int offset = 0;
            foreach (var number in numbers) {
                yield return number - offset;
                offset = number;
            }
        }
    } 

    class Program {

        static void TimeAction(string description, int iterations, Action func) {
            var watch = new Stopwatch();
            watch.Start();
            for (int i = 0; i < iterations; i++) {
                func(); 
            }
            watch.Stop();
            Console.Write(description);
            Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
        }

        static void Main(string[] args) {

            int max = 100000;
            int lookupCount = 1000;
            int iterations = 500;
            var rand = new Random();
            var array = Enumerable.Range(0, max).ToArray();
            var lookups = Enumerable.Range(0, lookupCount).Select(i => rand.Next(max - 1)).Distinct().OrderBy(_ => _).ToArray();

            // warmup 
            array.GetIndexedItemsEric(lookups).ToArray();
            array.GetIndexedItemsSam(lookups).ToArray();

            TimeAction("Eric's Solution", iterations, () => {
                array.GetIndexedItemsEric(lookups).ToArray();
            });

            TimeAction("Sam's Solution", iterations, () =>
            {
                array.GetIndexedItemsEric(lookups).ToArray();
            });

            Console.ReadKey();
        }
    }
}
Eric's Solution Time Elapsed 770 ms
Sam's Solution Time Elapsed 768 ms

I like linq.

    IList<T> list = collection.ToList<T>();

    var result = from i in indexes
                 select list[i];

    return result.ToList<T>();

As I understand it, an ICollection may not necessarily have any order which is why there isn't an extremely elegant solution to access things by index. You many want to consider using a dictionary or list to store the data in the collection.

The best way I can think of is to iterate through the collection while keeping track of what index you are on. Then check if the indexes list contains that index. If so, return that element.

    public static IEnumerable<T> WhereIndexes<T>(this IEnumerable<T> collection, IEnumerable<int> indexes)
    {
        IList<T> l = new List<T>(collection);
        foreach (var index in indexes)
        {
            yield return l[index]; 
        }
    }

It seems that the most efficient way would be to use a Dictionary<int,T> instead of a Collection<T> . You can still keep a list of indexes you want to use in the IList<int> .

也许我错过了一些东西,但是有什么不对的:

indexes.Select( (index => values[index]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM