New LINQ extensions in .NET 6 and benchmarks
LINQ has also gotten some attention during the development of .NET 6. Multiple new extensions have been added. Among these additions are: Support for indices and ranges when using IEnumerable
collections, adding key selector variants for many of the existing LINQ extensions so that you can use e.g. MaxBy
or DistinctBy
, adding overloads to the extensions with defaults like FirstOrDefault
so that you can decide the default, and a couple of other minor additions. We will go through these additions to LINQ in this article and benchmark some of them against previous approaches.
Prerequisites
These additions were first introduced in .NET 6 preview 4 so to use these new extensions you need to install this version of .NET 6 or newer. It can be downloaded following this link: Download .NET 6
For the best developer experience, you also need the newest preview version of Visual Studio. Visual Studio Preview can be downloaded here: Download Visual Studio Preview
ElementAt
with Index
C# 8 introduced indices which can be used to index in lists and arrays from the end like so:
var array = new string[] {"You", "say", "goodbye", "I", "say", "hello"};
Console.WriteLine(array[^1]); // Writes `hello` because it is the first element from the end.
These were exclusive to collections like arrays and lists and this makes sense since it is easy to find the k'th element from the end when you know how long a collection is.
But now, we also can do it for any IEnumerable
collection using ElementAt(Index index)
. This does not mean that they have found some magical way to always know the count of all IEnumerable
s. But they do check if the underlying type has a fixed size and uses that if available. Internally they use TryGetNonEnumeratedCount
to try to get the count without enumerating and if not possible then they enumerate the collection.
To test the performance we can set up a benchmark using BenchmarkDotNet.
public class SecondLast
{
private readonly IEnumerable<int> Range;
public SecondLast()
{
Range = Enumerable.Range(0, 1_000);
}
[Benchmark]
public int SecondLastWithIndex() =>
Range.ElementAt(^2);
[Benchmark]
public int SecondLastWithCount() =>
Range.ElementAt(Range.Count()-2);
[Benchmark]
public int SecondLastWithTakeLast() =>
Range.TakeLast(2).ElementAt(0);
}
This setup uses an Enumerable
Range
from 0
to 999
. We then try to go get the second last element in the collection in three different ways that I thought some would use and ran the benchmark. We get the following results.
Method | Mean | Error | StdDev |
---|---|---|---|
SecondLastWithIndex | 16.28 ns | 0.258 ns | 0.242 ns |
SecondLastWithCount | 13.58 ns | 0.160 ns | 0.142 ns |
SecondLastWithTakeLast | 71.33 ns | 1.007 ns | 0.892 ns |
This shows that the new ElementAt
which accepts an index is pretty close to being just as good as using the count when getting an element close to the end of a somewhat large IEnumerable
. The reason the count is better for this is probably that the underlying type is very easy to get the count from whereas other IEnumerables
might have a more difficult process before being able to get the count. Using the index is definitely much better than taking the last two elements and then the first, as expected. The new overload seems like a great trade-off since it is easier to read and more concise while only being a tiny bit slower. If you are really interested in performance then you probably weren't going to use LINQ for tasks like these anyway. An example where the simplicity is especially clear could be something like this:
var dataSource = // Some big data collection
// Before
var sortedSelection = dataSource
.Where(e => e.x == 42)
.OrderBy(e => e.y);
var result = sortedSelection
.ElementAt(sortedSelection.Count()-2);
// Now
var result = dataSource
.Where(e => e.x == 42)
.OrderBy(e => e.y)
.ElementAt(^2);
This is of cause only one use case for the new extension overload and you might see other results depending on the parameters for the method and what collection type you use.
Take
with Range
With C# 8 we also got Range
s which could be used for arrays like so:
var array = new string[] {"You", "say", "goodbye", "I", "say", "hello"};
Console.WriteLine(string.Join(' ', array[^3..])); // Writes `I say hello` because these words are the last three.
Now we can use them on IEnumerable
collections as well with a new overload to Take
which takes a Range
. This new overload is very expressive and can be used instead of many of the existing use cases. There are some rather simple use cases like:
Take(..10)
, Take(^10..)
, Take(10..)
, and Take(..^10)
where the existing extensions like
Take(10)
, TakeLast(10)
, Skip(10)
, and SkipLast(10)
already have the exact same functionality.
[Benchmark]
public int[] FirstTenElementsWithRange() =>
Range.Take(..10).ToArray();
[Benchmark]
public int[] FirstTenElementsWithTakeCount() =>
Range.Take(10).ToArray();
[Benchmark]
public int[] LastTenElementsWithRange() =>
Range.Take(^10..).ToArray();
[Benchmark]
public int[] LastTenElementsWithTakeLastCount() =>
Range.TakeLast(10).ToArray();
[Benchmark]
public int[] AllElementsExceptFirstTenWithRange() =>
Range.Take(10..).ToArray();
[Benchmark]
public int[] AllElementsExceptFirstTenWithSkip() =>
Range.Skip(10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenLastWithRange() =>
Range.Take(..^10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenLastWithSkipLast() =>
Range.SkipLast(10).ToArray();
Method | Mean | Error | StdDev |
---|---|---|---|
FirstTenElementsWithRange | 30.35 ns | 0.646 ns | 0.634 ns |
FirstTenElementsWithTakeCount | 21.91 ns | 0.499 ns | 0.534 ns |
LastTenElementsWithRange | 272.20 ns | 4.281 ns | 4.004 ns |
LastTenElementsWithTakeLastCount | 257.80 ns | 4.694 ns | 4.390 ns |
AllElementsExceptFirstTenWithRange | 9,989.97 ns | 151.537 ns | 141.748 ns |
AllElementsExceptFirstTenWithSkip | 645.23 ns | 10.512 ns | 8.778 ns |
AllElementsExceptTenLastWithRange | 9,752.49 ns | 151.745 ns | 141.942 ns |
AllElementsExceptTenLastWithSkip | 9,906.53 ns | 196.073 ns | 261.752 ns |
We indeed see very similar results for most of the equivalent extensions for simple ranges except when skipping from the start where the existing Skip
is noticeably quicker. This is probably due to some efficient deferred execution when using Skip
. These were some very simple examples and we should probably still use the existing solutions, but it is nice that the extension is so versatile that it presents us with these alternatives.
There are three other combinations of Range
s that can also be used instead of existing extensions like
.Take(10..20)
.Take(10..^10)
.Take(^20..^10)
But the nice thing about these Range
s is that they express the equivalent of multiple of the previous extensions just more compact. Let's write up some benchmarks for these ranges compared with other equivalent extensions.
[Benchmark]
public int[] ElementsFromTenthTillTwentiethWithRange() =>
Range.Take(10..20).ToArray();
[Benchmark]
public int[] ElementsFromTenthTillTwentiethWithSkipAndTake() =>
Range.Skip(10).Take(10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithRange() =>
Range.Take(10..^10).ToArray();
[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithSkipAndTakeCount() =>
Range.Skip(10).Take(Range.Count()-20).ToArray();
[Benchmark]
public int[] AllElementsExceptTenFirstAndLastWithSkipAndSkipLast() =>
Range.Skip(10).SkipLast(10).ToArray();
[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithRange() =>
Range.Take(^20..^10).ToArray();
[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithSkipCountAndTake() =>
Range.Skip(Range.Count()-20).Take(10).ToArray();
[Benchmark]
public int[] ElementsFromTwentiethLastTillTenthLastWithTakeLastAndTake() =>
Range.TakeLast(20).Take(10).ToArray();
Method | Mean | Error | StdDev |
---|---|---|---|
ElementsFromTenthTillTwentiethWithRange | 35.04 ns | 0.749 ns | 0.700 ns |
ElementsFromTenthTillTwentiethWithSkipAndTake | 31.92 ns | 0.686 ns | 0.816 ns |
AllElementsExceptTenFirstAndLastWithRange | 9,175.28 ns | 160.421 ns | 150.058 ns |
AllElementsExceptTenFirstAndLastWithSkipAndTakeCount | 660.95 ns | 4.969 ns | 3.880 ns |
AllElementsExceptTenFirstAndLastWithSkipAndSkipLast | 9,433.38 ns | 110.957 ns | 86.628 ns |
ElementsFromTwentiethLastTillTenthLastWithRange | 278.31 ns | 5.253 ns | 4.657 ns |
ElementsFromTwentiethLastTillTenthLastWithSkipCountAndTake | 44.28 ns | 0.622 ns | 0.552 ns |
ElementsFromTwentiethLastTillTenthLastWithTakeLastAndTake | 277.14 ns | 3.216 ns | 3.008 ns |
Using a Range
is pretty similar to the existing extensions when taking elements in a range at the start of a collection. This is expected as taking elements from the start of an IEnumerable
is pretty quick already.
Taking a range in the middle of an IEnumerable
is pretty equivalent to using the LINQ extensions that we would normally use, but it's still much slower than simply getting the count and using that to Skip
and Take
similar to how ElementAt
with Count
was quicker than using an index.
The same applies to taking a small range at the end of an IEnumerable
. It is faster to take the Count
and use that to calculate how much to skip with Skip
and Take
again. But it is equivalent to the other approach that uses Skip
and SkipLast
.
To sum this part up. Using a Range
for Take
is as good as what we would normally do. It is of cause not as good as getting the Count
manually, but it is much more expressive and easy to read. The reason that using Count
is quicker is again probably since the underlying type for the range that we use in the test can easily get the Count
.
FirstOrDefault, LastOrDefault, and SingleOrDefault
The FirstOrDefault
, LastOrDefault
, and SingleOrDefault
extensions have gotten extra overloads so that you can give what they should default to if they are not successful. Previously we would do the following.
var numbers = new List<int>() {3, 1, 4, 1, 5, 9};
var result = Enumerable
.Range(0, numbers.Count())
.Where(i => numbers[i] == 6)
.FirstOrDefault();
This sets the result to 0
since none of the numbers are equal to 6 so the collection is empty. The result is set to 0
because default(int)
is 0
. This would make it tough to know if the first element was 6
or if there was no element fitting the condition. Instead, we can do the following now:
var numbers = new List<int>() {3, 1, 4, 1, 5, 9};
var result = Enumerable
.Range(0, numbers.Count())
.Where(i => numbers[i] == 6)
.FirstOrDefault(-1);
And now we know that if the result is -1
then there was not found any. I could have used this in one of my previous articles: Using the new PriorityQueue from .NET 6 where I had to make an extra condition in a function to return -1
if nothing was found.
The same applies for LastOrDefault
and SingleOrDefault
. LastOrDefault
will return the default value if there are not found any elements and SingleOrDefault
will return the default value if the collection does not contain exactly one element.
MaxBy
, MinBy
, DistinctBy
, UnionBy
, IntersectBy
, and ExceptBy
The extensions Max
, Min
, Distinct
, Union
, Intersect
, and Except
have all gotten variants that make it possible to defines order or equality using a function on each element. This is equivalent to the existing variant of the Order
extension called OrderBy
.
First, we can now do the following with MaxBy
and MinBy
var group1 = new List<(string Name, int Age)>()
{
(Name: "Vicki", Age: 24),
(Name: "Leonard ", Age: 24),
(Name: "Eve", Age: 29),
};
var group2 = new List<(string Name, int Age)>()
{
(Name: "Eric", Age: 43),
(Name: "David", Age: 24),
(Name: "Lucy", Age: 34),
};
var max = group1.MaxBy(p => p.Age).Name;
// Person with max age in group1: Eve
var min = group1.MinBy(p => p.Age).Name;
// Person with min age in group1: Vicki
And using the same groups we can do the following with DistinctBy
, UnionBy
, IntersectBy
, and ExceptBy
.
var distinct = group1
.DistinctBy(p => p.Age)
.Select(p => p.Name);
// People with unique age in group1: Vicki, Eve
var union = group1
.UnionBy(group2, p => p.Age)
.Select(p => p.Name);
// Union of group1 and group2 but only one with each age: Vicki, Eve, Eric, Lucy
var intersect = group1
.IntersectBy(group2.Select(p => p.Age), p => p.Age)
.Select(p => p.Name);
// Unique people by age in group1 that do have a person in group2 with same age: Vicki
var except = group1
.ExceptBy(group2.Select(p => p.Age), p => p.Age)
.Select(p => p.Name);
// Unique people by age in group1 that do not have a person in group2 with same age: Eve
Zip
with 3 IEnumerable
s
Previously we could only do Zip
with two IEnumerable
s. With this we could do things like this:
var times = Enumerable.Range(0,10);
var cords = times.Select(t => t * 3 + 1);
var timeCordTuples = times.Zip(cords);
foreach ((int time, int cord) in timeCordTuples) {
Console.WriteLine($"{time} | cord:{cord}");
}
But if we wanted to loop over 3 values that were connected then we couldn't Zip
those three values together before. Now we can like so:
var times = Enumerable.Range(0,10);
var cords = times.Select(t => t * 3 + 1);
var altitudes = times.Select(t => t * 1.1 + 100);
var timeCordAltitudesTuples = times.Zip(cords, altitudes);
foreach ((int time, int cord, double altitude) in timeCordAltitudesTuples) {
Console.WriteLine($"{time} | cord:{cord}, alt: {altitude}");
}
Then the obvious question is: Now, we have the capability to Zip
three IEnumerable
s, what about four? It was a deliberate choice to keep it at three as the reviewers for the issue could see some use for three-Zip
s while they had very rarely or newer needed to Zip
more. It seems trivial to just add another overload for four, but adding unneeded overloads to an API will add extra work for maintainers in the future with little to no value to show for it. So keeping it at three seems like a reasonable choice.
Chunk
If you have ever needed to process a large IEnumerable
in batches of a fixed size (e.g. for streaming) then this would probably have been useful for you. Using the new Chunk(int size)
you can get smaller arrays of a known max-size which it will try to fill out and return in a new IEnumerable
containing arrays of the type that you were calling Chunk
on. An example could be a chunked data streamer in a SignalR Hub:
public async IAsyncEnumerable<DataType[]> DataChunkStreamer() {
var data = // Some big IEnumerable of DataType to stream
foreach(DataType[] chunk in data.Chunk(100)) {
yield return chunk;
}
}
Conclusion
In this article, we have seen how to use the Index
overload for ElementAt
and compared its speed with alternative solutions. We have then similarly seen how to use the new Range
overload for Take
and compared many different scenarios with alternative solutions using previous LINQ extensions. Then we have seen how to use the new overloads for LINQ extensions with defaults. We also presented the new By
variants for existing extensions that use comparison or equality like MinBy
or DistinctBy
. In the end, we have given small examples of how to use the Zip
and Chunk
extensions. The benchmarks in this article were evaluated on a local machine with limited repetition so the results are not to be used as actual metrics but more to understand scale.