简体   繁体   English

参数的最佳实践:IEnumerable与IList对比IReadOnlyCollection

[英]Best practice for parameter: IEnumerable vs. IList vs. IReadOnlyCollection

I get when one would return an IEnumerable from a method—when there's value in deferred execution. 当得到延迟执行中的值时,我会从方法返回 IEnumerable时得到。 And returning a List or IList should pretty much be only when the result is going to be modified, otherwise I'd return an IReadOnlyCollection , so the caller knows what he's getting isn't intended for modification (and this lets the method even reuse objects from other callers). 返回ListIList应该只是在修改结果时,否则我将返回一个IReadOnlyCollection ,因此调用者知道他得到的内容不是用于修改(这使得该方法甚至可以重用对象)来自其他来电者)。

However, on the parameter input side, I'm a little less clear. 但是,在参数输入方面,我有点不太清楚。 I could take an IEnumerable , but what if I need to enumerate more than once? 可以使用IEnumerable ,但如果我需要多次枚举怎么办?

The saying " Be conservative in what you send, be liberal in what you accept " suggests taking an IEnumerable is good, but I'm not really sure. 俗话说“ 你发送的东西要保守,你接受的东西要自由 ”,建议拿一个IEnumerable是好的,但我不太确定。

For example, if there are no elements in the following IEnumerable parameter, a significant amount of work can be saved in this method by checking .Any() first, which requires ToList() before that to avoid enumerating twice . 例如,如果以下IEnumerable参数中没有元素,则可以通过首先检查.Any()来保存此方法中的大量工作,这需要在此之前使用ToList()避免枚举两次

public IEnumerable<Data> RemoveHandledForDate(IEnumerable<Data> data, DateTime dateTime) {
   var dataList = data.ToList();

   if (!dataList.Any()) {
      return dataList;
   }

   var handledDataIds = new HashSet<int>(
      GetHandledDataForDate(dateTime) // Expensive database operation
         .Select(d => d.DataId)
   );

   return dataList.Where(d => !handledDataIds.Contains(d.DataId));
}

So I'm wondering what is the best signature, here? 所以我想知道什么是最好的签名,在这里? One possibility is IList<Data> data , but accepting a list suggests that you plan to modify it, which is not correct—this method doesn't touch the original list, so IReadOnlyCollection<Data> seems better. 一种可能性是IList<Data> data ,但接受列表表明您计划修改它,这是不正确的 - 此方法不会触及原始列表,因此IReadOnlyCollection<Data>似乎更好。

But IReadOnlyCollection forces callers to do ToList().AsReadOnly() every time which gets a bit ugly, even with a custom extension method .AsReadOnlyCollection . 但是IReadOnlyCollection强制调用者每次执行ToList().AsReadOnly()都会变得有点难看,即使使用自定义扩展方法.AsReadOnlyCollection And that's not being liberal in what is accepted. 在接受的东西中,这并不是自由主义者。

What is best practice in this situation? 在这种情况下,最佳做法是什么?

This method is not returning an IReadOnlyCollection because there may be value in the final Where using deferred execution as the whole list is not required to be enumerated. 此方法不返回一个IReadOnlyCollection因为有可能在最后的价值Where因为不需要整个列表进行枚举使用延迟执行。 However, the Select is required to be enumerated because the cost of doing .Contains would be horrible without the HashSet . 但是,需要枚举Select ,因为没有HashSet ,执行.Contains的成本会很糟糕。

I don't have a problem with calling ToList , it just occurred to me that if I need a List to avoid multiple enumeration, why do I not just ask for one in the parameter? 我没有调用ToList的问题,我刚想到如果我需要一个List来避免多次枚举,为什么我不只是在参数中要求一个? So the question here is, if I don't want an IEnumerable in my method, should I really accept one in order to be liberal (and ToList it myself), or should I put the burden on the caller to ToList().AsReadOnly() ? 所以这里的问题是,如果我不想在我的方法中使用IEnumerable ,我是否应该真正接受一个为了自由(并自己ToList ),或者我应该把调用者的负担放到ToList().AsReadOnly()

Further Information for those unfamiliar with IEnumerables 有关IEnumerables不熟悉的人的更多信息

The real problem here is not the cost of Any() vs. ToList() . 这里真正的问题不是Any()ToList()的成本。 I understand that enumerating the entire list costs more than doing Any() . 我知道枚举整个列表的成本比执行Any() However, assume the case that the caller will consume all items in the return IEnumerable from the above method, and assume that the source IEnumerable<Data> data parameter comes from the result of this method: 但是,假设调用者将使用上述方法返回IEnumerable中的所有项,并假设源IEnumerable<Data> data参数来自此方法的结果:

public IEnumerable<Data> GetVeryExpensiveDataForDate(DateTime dateTime) {
    // This query is very expensive no matter how many rows are returned.
    // It costs 5 seconds on each `.GetEnumerator` call to get 1 value or 1000
    return MyDataProvider.Where(d => d.DataDate == dateTime);
}

Now if you do this: 现在,如果你这样做:

var myData = GetVeryExpensiveDataForDate(todayDate);
var unhandledData = RemoveHandledForDate(myData, todayDate);
foreach (var data in unhandledData) {
   messageBus.Dispatch(data); // fully enumerate
)

And if RemovedHandledForDate does Any and does Where , you'll incur the 5 second cost twice , instead of once. 如果RemovedHandledForDate执行Any 执行Where ,则会产生两次 5秒的成本,而不是一次。 This is why you should always take extreme pains to avoid enumerating an IEnumerable more than once . 这就是为什么你应该总是采取极端的痛苦,以避免不止一次枚举IEnumerable Do not rely on your knowledge that in fact it's harmless, because some future hapless developer may call your method some day with a newly implemented IEnumerable you never thought of, which has different characteristics. 不要依赖你的知识,事实上它是无害的,因为一些未来不幸的开发人员可能会在某天使用你从未想过的新实现的IEnumerable调用你的方法,它具有不同的特征。

The contract for an IEnumerable says that you can enumerate it. IEnumerable的合同说你可以枚举它。 It does NOT promise anything about the performance characteristics of doing so more than once. 它不会对不止一次这样做的性能特征做出任何承诺。

In fact, some IEnumerables are volatile and won't return any data upon a subsequent enumeration! 实际上,一些IEnumerables易失性的,并且在后续枚举时不会返回任何数据! Switching to one would be a totally breaking change if combined with multiple enumeration (and a very hard to diagnose one if the multiple enumeration was added later). 如果与多个枚举相结合,则切换到一个将是完全破坏性的变化(如果稍后添加多个枚举则很难诊断一个)。

Don't do multiple enumeration of an IEnumerable. 不要对IEnumerable进行多次枚举。

If you accept an IEnumerable parameter, you are in effect promising to enumerate it exactly 0 or 1 times. 如果您接受IEnumerable参数,那么您实际上有希望将它精确地枚举0或1次。

There are definitely ways around that will let you accept IEnumerable<T> , only enumerate once and make sure you don't query the database multiple times. 有一些方法可以让你接受IEnumerable<T> ,只枚举一次并确保你不多次查询数据库。 Solutions I can think of: 我能想到的解决方案:

  • instead of using Any and Where you could use the enumerator directly. 而不是使用AnyWhere你可以直接使用枚举器。 Call MoveNext instead of Any to see if there are any items in the collection, and manually iterate further in after making your database query. 调用MoveNext而不是Any来查看集合中是否有任何项目,并在进行数据库查询后手动迭代。
  • use Lazy to initialize your HashSet . 使用Lazy初始化您的HashSet

The first one seems ugly, the second one might actually make a lot of sense: 第一个似乎很难看,第二个可能实际上很有意义:

public IEnumerable<Data> RemoveHandledForDate(IEnumerable<Data> data, DateTime dateTime)
{
    var ids = new Lazy<HashSet<int>>(
        () => new HashSet<int>(
       GetHandledDataForDate(dateTime) // Expensive database operation
          .Select(d => d.DataId)
    ));

    return data.Where(d => !ids.Value.Contains(d.DataId));
}

You can take an IEnumerable<T> in the method, and use a CachedEnumerable similar to the one here to wrap it. 您可以在方法中使用IEnumerable<T> ,并使用类似于此处的CachedEnumerable来包装它。

This class wraps an IEnumerable<T> and makes sure that it is only enumerated once. 此类包装IEnumerable<T>并确保仅枚举一次。 If you try to enumerate it again, it yield items from the cache. 如果您尝试再次枚举它,它会从缓存中生成项目。

Please note that such wrapper does not read all items from the wrapped enumerable immediately. 请注意,这样的包装器不会立即从包装的可枚举中读取所有项目。 It only enumerates individual items from the wrapped enumerable as you enumerate individual items from the wrapper, and it caches the individual items along the way. 当您从包装器枚举单个项目时,它仅枚举包装的可枚举项中的各个项目,并在此过程中缓存各个项目。

This means that if you call Any on the wrapper, only a single item will be enumerated from the wrapped enumerable, and then such item will be cached. 这意味着如果在包装器上调用Any ,则只会从包装的枚举中枚举单个项目,然后将缓存此类项目。

If you then use the enumerable again, it will first yield the first item from the cache, and then continue enumerating the original enumerator from where it left. 如果您再次使用枚举,它将首先从缓存中生成第一个项目,然后继续枚举它离开的原始枚举器。

You can do something like this to use it: 你可以做这样的事情来使用它:

public IEnumerable<Data> RemoveHandledForDate(IEnumerable<Data> data, DateTime dateTime)
{
    var dataWrapper = new CachedEnumerable(data);
    ...
}

Notice here that the method itself is wrapping the parameter data . 请注意,方法本身正在包装参数data This way, you don't force consumers of your method to do anything. 这样,您不会强制您的方法的使用者做任何事情。

IReadOnlyCollection<T> adds to IEnumerable<T> a Count property and the corresonding promise that there is no deferred execution . IReadOnlyCollection<T>IEnumerable<T> IReadOnlyCollection<T>添加一个Count属性和相应的承诺,即没有延迟执行 It would be the appropriate parameter to ask for, if the parameter is where you want to tackle this problem. 如果参数是您要解决此问题的位置,那么它将是要求的适当参数。

However, I suggest asking for IEnumerable<T> , and calling ToList() in the implementation itself instead. 但是,我建议请求IEnumerable<T> ,并在实现本身中调用ToList()

Observation: Both approaches have the drawback that the multiple enumeration may at some point be refactored away, rendering the parameter change or ToList() call redundant, which we may overlook. 观察:两种方法都有一个缺点,即多重枚举可能会在某些时候被重构,导致参数更改或ToList()调用冗余,我们可能会忽略。 I do not think this can be avoided. 我不认为这是可以避免的。

The case does speak for calling ToList() in the method body: Since the multiple enumeration is an implementation detail, avoiding it should be an implementation detail as well. 这个案例的确代表在方法体中调用ToList() :由于多个枚举是一个实现细节,避免它应该也是一个实现细节。 This way, we avoid affecting the API. 这样,我们就可以避免影响API了。 We also avoid changing back the API if the multiple enumeration ever gets refactored away. 我们也避免更改 API如果多个枚举不断被重构了。 We also avoid propagating the requirement through a chain of methods, that otherwise might all decide to ask for an IReadOnlyCollection<T> , only because of our multiple enumeration. 我们还避免通过一系列方法传播需求,否则可能都会决定要求IReadOnlyCollection<T> ,这只是因为我们的多次枚举。

If you are concerned with the overhead of creating extra lists (when the output already is a list or so), Resharper suggests the following approach: 如果您担心创建额外列表的开销(当输出已经是列表时),Resharper建议采用以下方法:

param = param as IList<SomeType> ?? param.ToList();

Of course, we can do even better, because we only need to protect against deferred execution - no need for a full-blown IList<T> : 当然,我们可以做得更好,因为我们只需要防止延迟执行 - 不需要一个成熟的IList<T>

param = param as IReadOnlyCollection<SomeType> ?? param.ToList();

I don't think this can be solved just by changing the input types. 我不认为只需更改输入类型就可以解决这个问题。 If you want to allows more general structures than List<T> or IList<T> then you have to decide if/how to handle these possible edge cases. 如果你想允许比List<T>IList<T>更多的通用结构,那么你必须决定是否/如何处理这些可能的边缘情况。

Either plan for the worst case and spend a little time/memory creating a concrete data structure, or plan for the best case and risk the occasional query getting executed twice. 要么计划最坏的情况,花一点时间/内存创建一个具体的数据结构,要么计划最好的情况,并冒险偶尔查询执行两次。

You might consider documenting that the method enumerates the collection multiple times so that the caller can decide if they want to pass in an "expensive" query, or hydrate the query before calling the method. 您可以考虑记录该方法多次枚举该集合,以便调用者可以决定是否要传递“昂贵”查询,或者在调用该方法之前水合查询。

I would argue that IEnumerable<T> is a good option for an argument type. 我认为IEnumerable<T>是参数类型的一个很好的选择。 It is a simple, generic and easy to provide structure. 它是一种简单,通用且易于提供的结构。 There is nothing inherent about the IEnumerable contract that implies that one should only ever iterate it once. IEnumerable合同没有任何内在的含义,暗示一个人只应该迭代一次。

In general, the performance cost for testing .Any() probably isn't high but, of course, cannot be guaranteed to be so. 一般来说,测试.Any()的性能成本可能不高,但当然不能保证这样。 In the circumstances you describe, it could obviously be the case that iterating the first element has considerable overhead but this is by no means universal. 在您描述的情况下,显然可能是迭代第一个元素有相当大的开销,但这绝不是普遍的。

Changing the parameter type to something like IReadOnlyCollection<T> or IReadOnlyList<T> is an option but probably only a good one in the circumstance that some or all of the properties/methods provided by that interface are required. 将参数类型更改为类似IReadOnlyCollection<T>IReadOnlyList<T>的选项是一个选项,但在需要该接口提供的部分或全部属性/方法的情况下可能只是一个好选项。

If you don't need that functionality and instead want to guarantee that your method only iterates the IEnumerable once, you can do so by calling .ToList() or by turning it into some other appropriate type of collection but that is an implementation detail of the method itself. 如果您不需要该功能,而是希望保证您的方法只迭代IEnumerable一次,您可以通过调用.ToList()或将其转换为其他适当类型的集合来实现,但这是一个实现细节方法本身。 If the contract that you are designing requires "something which can be iterated" then IEnumerable<T> is a very appropriate choice. 如果您正在设计的合同需要“可以迭代的东西”,那么IEnumerable<T>是一个非常合适的选择。

Your method has the power to make guarantees about how many times any collection will be iterated, you don't need to expose that detail beyond the boundaries of your method. 您的方法有权保证任何集合的迭代次数,您不需要将该细节暴露在方法的边界之外。

By contrast, if you do choose to repeatedly enumerate an IEnumerable<T> inside your method then you must also take into consideration every eventuality which could be a result of that choice, for instance potentially getting different results in different circumstances due to deferred execution. 相反,如果您确实选择在方法中重复枚举IEnumerable<T>那么您还必须考虑可能是该选择的结果的每个可能性,例如由于延迟执行可能在不同情况下获得不同的结果。

That said, as a point of best practise, I think it makes sense to try to avoid any side-effects in IEnumerables returned by your own code to the maximum extent possible - languages like Haskell can make use of lazy evaluation throughout safely because they go to great pains to avoid side effects. 也就是说,作为最佳实践的一点,我认为尽可能避免自己的代码返回的IEnumerables任何副作用是有意义的 - 像Haskell这样的语言可以安全地使用惰性评估,因为它们去了努力避免副作用。 If nothing else, people who consume your code might not be as dilligent as you in guarding against multiple enumeration. 如果不出意外,那些使用您的代码的人在防止多次枚举时可能不会像您那样烦恼。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM