New LINQ extensions in .NET 6 and benchmarks

LINQ has also gotten some attention during the development of .NET 6. Multiple new extensions have been added. Among these additions are: Support for indices and ranges when using IEnumerable collections, adding key selector variants for many of the existing LINQ extensions so that you can use e.g. MaxBy or DistinctBy, adding overloads to the extensions with defaults like FirstOrDefault so that you can decide the default, and a couple of other minor additions. We will go through these additions to LINQ in this article and benchmark some of them against previous approaches.

Prerequisites

These additions were first introduced in .NET 6 preview 4 so to use these new extensions you need to install this version of .NET 6 or newer. It can be downloaded following this link: Download .NET 6

For the best developer experience, you also need the newest preview version of Visual Studio. Visual Studio Preview can be downloaded here: Download Visual Studio Preview

ElementAt with Index

C# 8 introduced indices which can be used to index in lists and arrays from the end like so:

var array = new string[] {"You", "say", "goodbye", "I", "say", "hello"};
Console.WriteLine(array[^1]); // Writes `hello` because it is the first element from the end.

These were exclusive to collections like arrays and lists and this makes sense since it is easy to find the k'th element from the end when you know how long a collection is.

But now, we also can do it for any IEnumerable collection using ElementAt(Index index). This does not mean that they have found some magical way to always know the count of all IEnumerables. But they do check if the underlying type has a fixed size and uses that if available. Internally they use TryGetNonEnumeratedCount to try to get the count without enumerating and if not possible then they enumerate the collection.

To test the performance we can set up a benchmark using BenchmarkDotNet.

public class SecondLast
{
    private readonly IEnumerable<int> Range;

    public SecondLast()
    {
        Range = Enumerable.Range(0, 1_000);
    }

    [Benchmark]
    public int SecondLastWithIndex() =>
                    Range.ElementAt(^2);

    [Benchmark]
    public int SecondLastWithCount() =>
                    Range.ElementAt(Range.Count()-2);

    [Benchmark]
    public int SecondLastWithTakeLast() =>
                    Range.TakeLast(2).ElementAt(0);
}

This setup uses an Enumerable Range from 0 to 999. We then try to go get the second last element in the collection in three different ways that I thought some would use and ran the benchmark. We get the following results.

MethodMeanErrorStdDev
SecondLastWithIndex16.28 ns0.258 ns0.242 ns
SecondLastWithCount13.58 ns0.160 ns0.142 ns
SecondLastWithTakeLast71.33 ns1.007 ns0.892 ns

This shows that the new ElementAt which accepts an index is pretty close to being just as good as using the count when getting an element close to the end of a somewhat large IEnumerable. The reason the count is better for this is probably that the underlying type is very easy to get the count from whereas other IEnumerables might have a more difficult process before being able to get the count. Using the index is definitely much better than taking the last two elements and then the first, as expected. The new overload seems like a great trade-off since it is easier to read and more concise while only being a tiny bit slower. If you are really interested in performance then you probably weren't going to use LINQ for tasks like these anyway. An example where the simplicity is especially clear could be something like this:

var dataSource = // Some big data collection

// Before
var sortedSelection = dataSource
                .Where(e => e.x == 42)
                .OrderBy(e => e.y);
                
var result = sortedSelection
                .ElementAt(sortedSelection.Count()-2);

// Now
var result = dataSource
                .Where(e => e.x == 42)
                .OrderBy(e => e.y)
                .ElementAt(^2);

This is of cause only one use case for the new extension overload and you might see other results depending on the parameters for the method and what collection type you use.

Take with Range

With C# 8 we also got Ranges which could be used for arrays like so:

var array = new string[] {"You", "say", "goodbye", "I", "say", "hello"};
Console.WriteLine(string.Join(' ', array[^3..])); // Writes `I say hello` because these words are the last three.

Now we can use them on IEnumerable collections as well with a new overload to Take which takes a Range. This new overload is very expressive and can be used instead of many of the existing use cases. There are some rather simple use cases like:

Take(..10), Take(^10..), Take(10..), and Take(..^10)

where the existing extensions like

Take(10), TakeLast(10), Skip(10), and SkipLast(10)

already have the exact same functionality.

[Benchmark]
public int[] FirstTenElementsWithRange() =>
                Range.Take(..10).ToArray();
[Benchmark]
public int[] FirstTenElementsWithTakeCount() =>
                Range.Take(10).ToArray();

[Benchmark]
public int[] LastTenElementsWithRange() =>
                Range.Take(^10..).ToArray();
[Benchmark]
public int[] LastTenElementsWithTakeLastCount() =>
                Range.TakeLast(10).ToArray();

[Benchmark]
public int[] AllElementsExceptFirstTenWithRange() =>
                Range.Take(10..).ToArray();
[Benchmark]
public int[] AllElementsExceptFirstTenWithSkip() =>
                Range.Skip(10).ToArray();

[Benchmark]
public int[] AllElementsExceptTenLastWithRange() =>
                Range.Take(..^10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenLastWithSkipLast() =>
                Range.SkipLast(10).ToArray();
MethodMeanErrorStdDev
FirstTenElementsWithRange30.35 ns0.646 ns0.634 ns
FirstTenElementsWithTakeCount21.91 ns0.499 ns0.534 ns
LastTenElementsWithRange272.20 ns4.281 ns4.004 ns
LastTenElementsWithTakeLastCount257.80 ns4.694 ns4.390 ns
AllElementsExceptFirstTenWithRange9,989.97 ns151.537 ns141.748 ns
AllElementsExceptFirstTenWithSkip645.23 ns10.512 ns8.778 ns
AllElementsExceptTenLastWithRange9,752.49 ns151.745 ns141.942 ns
AllElementsExceptTenLastWithSkip9,906.53 ns196.073 ns261.752 ns

We indeed see very similar results for most of the equivalent extensions for simple ranges except when skipping from the start where the existing Skip is noticeably quicker. This is probably due to some efficient deferred execution when using Skip. These were some very simple examples and we should probably still use the existing solutions, but it is nice that the extension is so versatile that it presents us with these alternatives.

There are three other combinations of Ranges that can also be used instead of existing extensions like

.Take(10..20)

.Take(10..^10)

.Take(^20..^10)

But the nice thing about these Ranges is that they express the equivalent of multiple of the previous extensions just more compact. Let's write up some benchmarks for these ranges compared with other equivalent extensions.

[Benchmark]
public int[] ElementsFromTenthTillTwentiethWithRange() =>
                Range.Take(10..20).ToArray();
[Benchmark]
public int[] ElementsFromTenthTillTwentiethWithSkipAndTake() =>
                Range.Skip(10).Take(10).ToArray();

[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithRange() =>
                Range.Take(10..^10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithSkipAndTakeCount() =>
                Range.Skip(10).Take(Range.Count()-20).ToArray();
[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithSkipAndSkipLast() =>
                Range.Skip(10).SkipLast(10).ToArray();

[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithRange() =>
                Range.Take(^20..^10).ToArray();
[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithSkipCountAndTake() =>
                Range.Skip(Range.Count()-20).Take(10).ToArray();
[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithTakeLastAndTake() =>
                Range.TakeLast(20).Take(10).ToArray();
MethodMeanErrorStdDev
ElementsFromTenthTillTwentiethWithRange35.04 ns0.749 ns0.700 ns
ElementsFromTenthTillTwentiethWithSkipAndTake31.92 ns0.686 ns0.816 ns
AllElementsExceptTenFirstAndLastWithRange9,175.28 ns160.421 ns150.058 ns
AllElementsExceptTenFirstAndLastWithSkipAndTakeCount660.95 ns4.969 ns3.880 ns
AllElementsExceptTenFirstAndLastWithSkipAndSkipLast9,433.38 ns110.957 ns86.628 ns
ElementsFromTwentiethLastTillTenthLastWithRange278.31 ns5.253 ns4.657 ns
ElementsFromTwentiethLastTillTenthLastWithSkipCountAndTake44.28 ns0.622 ns0.552 ns
ElementsFromTwentiethLastTillTenthLastWithTakeLastAndTake277.14 ns3.216 ns3.008 ns

Using a Range is pretty similar to the existing extensions when taking elements in a range at the start of a collection. This is expected as taking elements from the start of an IEnumerable is pretty quick already.

Taking a range in the middle of an IEnumerable is pretty equivalent to using the LINQ extensions that we would normally use, but it's still much slower than simply getting the count and using that to Skip and Take similar to how ElementAt with Count was quicker than using an index.

The same applies to taking a small range at the end of an IEnumerable. It is faster to take the Count and use that to calculate how much to skip with Skip and Take again. But it is equivalent to the other approach that uses Skip and SkipLast.

To sum this part up. Using a Range for Take is as good as what we would normally do. It is of cause not as good as getting the Count manually, but it is much more expressive and easy to read. The reason that using Count is quicker is again probably since the underlying type for the range that we use in the test can easily get the Count.

FirstOrDefault, LastOrDefault, and SingleOrDefault

The FirstOrDefault, LastOrDefault, and SingleOrDefault extensions have gotten extra overloads so that you can give what they should default to if they are not successful. Previously we would do the following.

var numbers = new List<int>() {3, 1, 4, 1, 5, 9};
var result = Enumerable
                .Range(0, numbers.Count())
                .Where(i => numbers[i] == 6)
                .FirstOrDefault();

This sets the result to 0 since none of the numbers are equal to 6 so the collection is empty. The result is set to 0 because default(int) is 0. This would make it tough to know if the first element was 6 or if there was no element fitting the condition. Instead, we can do the following now:

var numbers = new List<int>() {3, 1, 4, 1, 5, 9};
var result = Enumerable
                .Range(0, numbers.Count())
                .Where(i => numbers[i] == 6)
                .FirstOrDefault(-1);

And now we know that if the result is -1 then there was not found any. I could have used this in one of my previous articles: Using the new PriorityQueue from .NET 6 where I had to make an extra condition in a function to return -1 if nothing was found.

The same applies for LastOrDefault and SingleOrDefault. LastOrDefault will return the default value if there are not found any elements and SingleOrDefault will return the default value if the collection does not contain exactly one element.

MaxBy, MinBy, DistinctBy, UnionBy, IntersectBy, and ExceptBy

The extensions Max, Min, Distinct, Union, Intersect, and Except have all gotten variants that make it possible to defines order or equality using a function on each element. This is equivalent to the existing variant of the Order extension called OrderBy.

First, we can now do the following with MaxBy and MinBy

var group1 = new List<(string Name, int Age)>()
{
    (Name: "Vicki", Age: 24),
    (Name: "Leonard ", Age: 24),
    (Name: "Eve", Age: 29),
};
var group2 = new List<(string Name, int Age)>()
{
    (Name: "Eric", Age: 43),
    (Name: "David", Age: 24),
    (Name: "Lucy", Age: 34),
};

var max = group1.MaxBy(p => p.Age).Name;
// Person with max age in group1: Eve

var min = group1.MinBy(p => p.Age).Name;
// Person with min age in group1: Vicki

And using the same groups we can do the following with DistinctBy, UnionBy, IntersectBy, and ExceptBy.

var distinct = group1
                .DistinctBy(p => p.Age)
                .Select(p => p.Name);
// People with unique age in group1: Vicki, Eve

var union = group1
                .UnionBy(group2, p => p.Age)
                .Select(p => p.Name);
// Union of group1 and group2 but only one with each age: Vicki, Eve, Eric, Lucy

var intersect = group1
                .IntersectBy(group2.Select(p => p.Age), p => p.Age)
                .Select(p => p.Name);
// Unique people by age in group1 that do have a person in group2 with same age: Vicki

var except = group1
                .ExceptBy(group2.Select(p => p.Age), p => p.Age)
                .Select(p => p.Name);
// Unique people by age in group1 that do not have a person in group2 with same age: Eve

Zip with 3 IEnumerables

Previously we could only do Zip with two IEnumerables. With this we could do things like this:

var times = Enumerable.Range(0,10);
var cords = times.Select(t => t * 3 + 1);
var timeCordTuples = times.Zip(cords);

foreach ((int time, int cord) in timeCordTuples) {
    Console.WriteLine($"{time} | cord:{cord}");
}

But if we wanted to loop over 3 values that were connected then we couldn't Zip those three values together before. Now we can like so:

var times = Enumerable.Range(0,10);
var cords = times.Select(t => t * 3 + 1);
var altitudes = times.Select(t => t * 1.1 + 100);
var timeCordAltitudesTuples = times.Zip(cords, altitudes);

foreach ((int time, int cord, double altitude) in timeCordAltitudesTuples) {
    Console.WriteLine($"{time} | cord:{cord}, alt: {altitude}");
}

Then the obvious question is: Now, we have the capability to Zip three IEnumerables, what about four? It was a deliberate choice to keep it at three as the reviewers for the issue could see some use for three-Zips while they had very rarely or newer needed to Zip more. It seems trivial to just add another overload for four, but adding unneeded overloads to an API will add extra work for maintainers in the future with little to no value to show for it. So keeping it at three seems like a reasonable choice.

Chunk

If you have ever needed to process a large IEnumerable in batches of a fixed size (e.g. for streaming) then this would probably have been useful for you. Using the new Chunk(int size) you can get smaller arrays of a known max-size which it will try to fill out and return in a new IEnumerable containing arrays of the type that you were calling Chunk on. An example could be a chunked data streamer in a SignalR Hub:

public async IAsyncEnumerable<DataType[]> DataChunkStreamer() {
    var data = // Some big IEnumerable of DataType to stream
    foreach(DataType[] chunk in data.Chunk(100)) {
        yield return chunk;
    }
}

Conclusion

In this article, we have seen how to use the Index overload for ElementAt and compared its speed with alternative solutions. We have then similarly seen how to use the new Range overload for Take and compared many different scenarios with alternative solutions using previous LINQ extensions. Then we have seen how to use the new overloads for LINQ extensions with defaults. We also presented the new By variants for existing extensions that use comparison or equality like MinBy or DistinctBy. In the end, we have given small examples of how to use the Zip and Chunk extensions. The benchmarks in this article were evaluated on a local machine with limited repetition so the results are not to be used as actual metrics but more to understand scale.

elmah.io: Error logging and Uptime Monitoring for your web apps

This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.

See how we can help you monitor your website for crashes Monitor your website