Should the BCL add an IEnumerableWithCount<T>?

NB: I’ve written and thrown away this blog post twice before in the past, as I would convince myself it was in the category of micro-optimization, but I’m finally going to post it this time, if only to get feedback to confirm that it’s a bad idea. Smile

There’s been a few times when it would be nice to have something be IEnumerable<T> but keep the data on how many items will be in the ‘stream’.  Certainly there are many cases where that’s not known ahead of time, and for those our existing IEnumerable<T> works perfectly. Smile

One thing in particular about IEnumerableWithCount<T> is that it could still be covariant like IEnumerable<T> is since we would only need to add a property with a getter.  That is a big difference from ICollection<T>, which is the current ‘lowest-level’ option for a collection interface that includes count.  ICollection<T> means you have to support lots of ‘write’ operations (Add/Clear/Remove) so it’s not what we want if we just want to have a ‘stream’ that has the additional ‘metadata’ of the count of items.

Even more more importantly, IEnumerableWithCount<T> would still support deferred-execution since it inherits from IEnumerable<T>

One key bit is that this interface will be able to ‘go between’ IEnumerable<T> and ICollection<T> – ICollection<T> (and therefore IList<T> and so forth) could also implement IEnumerableWithCount<T> (would just need to declare it, since it already has a Count property that matches).  This means most (if not all) of the collections we’re used to will get this interface ‘for free’.

Somewhat unfortunately, the canonical example of where an IEnumerableWithCount would have benefit is in the Select method of LINQ.

Much of LINQ-to-Objects includes interface checks to optimize performance if the input object supports a higher-level interface than the method requires.

For instance, let’s look at ToArray:

The real work is done in the ctor of Buffer, which is where we can see the first place that would use our new IEnumerableWithCount<T>:

This is where we see one of the places that could change if we added the new interface.

Currently the perf optimization is based on checking for ICollection<TElement> – if that interface is implemented by the source, then we get to skip all the reallocations (since we don’t know the final size) and create the final array directly.

The code also uses ICollection<T>’s CopyTo method to copy the actual elements, and since that seems out of scope for an IEnumerableWithCount, we’d probably just have 2 paths and IEnumerableWithCount would foreach, so that situation would be a little less CPU-efficient (but still better than today’s existing IEnumerable path in the code, since , but saving the reallocations (and generated garbage) is still a big potential perf win.

For the LINQ benefit in particular, we’d likely want a static System.Linq.EnumerableWithCount (similar to Enumerable and Queryable) that implemented the particular methods that can be done ‘better’ with count information.e

What methods would our EnumerableWithCount static class support?  These are the methods being used with LINQ-to-Objects that would benefit from having the compiler using an implementation by our EnumerableWithCount class instead of the current implementation in Enumerable.  If it’s not in this list, then falling back to the Enumerable version is fine (much like some methods aren’t implemented by Queryable), which is what the compiler will do for us.

  • Any method that doesn’t affect the number of items when it acts on the enumerable, of course
    • Select is the method that sticks out, certainly
    • OrderBy / OrderByDescending
    • ThenBy / ThenByDescending
    • Reverse
    • Cast
  • Concat can also benefit, as we’d be able to know the target item count in such a situation without enumerating (so concat’ing multiple inputs and then ToArray would also save all those reallocations)
  • Similarly, Skip and Take would benefit, since we’d know the resulting count due to both – in paging scenarios, for instance, this is very common
  • Count/LongCount is similar in its benefit, as it would be able to just return the Count property without actually enumerating.
  • DefaultIfEmpty would know if it’s already empty or not
  • Zip would know the final count (since it’s Min of the input counts)
  • SequenceEqual could check count mismatch first
  • Various methods that could either throw or return default(T) based on the count instead of having to iterate first, like:
    • ElementAt/ElementAtOrDefault
    • First/FirstOrDefault
    • Last/LastOrDefault
    • Single/SingleOrDefault (which would be able to throw for count > 1 without needing to enumerate twice)
  • ElementAt could throw on “index out of range” immediately without iterating
  • ElementAtOrDefault could similarly know whether to return default immediately
  • First/FirstOrDefault could similarly throw or return default without iterating
  • ToDictionary would know the final count and could preallocate buckets better
  • Range/Repeat/Empty would all be able to create this interface since the final count is known
  • Average would know the count to divide by without incrementing a counter 🙂

Some remaining questions:

  • If the compiler supported IEnumerableWithCount<T> (using the same ‘yield return’ syntax), how would it work?  Would the method declaration need an out param for count?
  • Should we create a matching IEnumeratorWithCount<T> as well?  I don’t think it’s necessary, but not really sure at this point.
Advertisements