Handling Complex Query Scenarios with Entity Framework Core

The database is an integral part of every application. However, it contributes most to the performance of any application due to its presence in almost every operation. In the .NET world, Entity Framework Core (EFCore) is one of the developer's favorite Object-Relational Mapping (ORM). It has simple syntax, concise LINQ queries, and plenty of features to support operations offered in any database. In today's post, I will introduce some tips that help you speed up database operations with EFCore.

Handling Complex Query Scenarios with Entity Framework Core

Tip 1: Project only properties you need

When fetching records from the database, analyze your requirements and only fetch the columns that fall in the requirements. This way you can optimize retrieval and reduce memory usage. In the following example, we need only the names of buildings and floors.

Step 1: define models 

public class Building
{
    public int Id { get; set; }
    public string Name { get; set; }
    public ICollection<Floor> Floors { get; set; }
}

public class Floor
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int BuildingId { get; set; }
    public Building Building { get; set; }
}

// A Dto represent the output
public class BuildingName
{
    public string Name {get;set;}
    public List<string>  FloorNames {get;set;}
}

Step 2: Query database

// Projecting only required properties 
var buildings = await context
    .Buildings 
    .Select(b => 
        new BuidlingName
        {
            Name = b.Name, 
            FloorNames = b
                .Floors
                .Select(f => f.Name)
                .ToList()
        })
    .ToListAsync();

Here we needed only the names of buildings along with their floor names. So we specifically fetched only names from the database. An optimized query will eliminate additional load on the application and the database.

Tip 2: Use AsNoTracking() for data retreival

AsNoTracking is a performance optimization technique when the query is only intended to read the data and not update it. It saves memory usage by disabling EF Core's update tracker. Especially in large-scale applications, this method can cut down overhead significantly.

The technique can be used like this:

var buildings = await context
    .Buildings 
    .AsNoTracking() 
    .Where(b => b.Name.StartsWith("State")) 
    .ToListAsync();

In most of your queries, you may want to just do read-only operations. So, you can configure it by default behavior and don't need to call AsNoTracking() with every fetch query like earlier example.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder
        .Entity<Building>()
        .HasQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);
}

Whenever you are fetching for update or deletion when tracking is required

var trackedBuilding = await context
    .Buildings
    .AsTracking() // Enables tracking for this query only
    .FirstOrDefaultAsync(b => b.Id == 5);

Tip 3: Use Lazy Loading vs. Eager Loading properly

EFCore loads related data of an entity in two ways: Lazy loading and Eager loading.

In Lazy Loading, entities are not loaded from the database until they are accessed for the first time. While in Eager Loading, all the associated entities are loaded along with the main entity.

To optimize underlying SQL queries with EFCore, you need to understand when to use which type of loading. Let's check the implementation of each. 

Lazy loading

Step 1: Install proxies package 

dotnet add package Microsoft.EntityFrameworkCore.Proxies

Step 2: Define models 

using System.Collections.Generic;

public class Building
{
    public int Id { get; set; }
    public string Name { get; set; }

    // Lazy-loaded navigation property (must be virtual)
    public virtual ICollection<Floor> Floors { get; set; }
}

public class Floor
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int BuildingId { get; set; }

    // Lazy-loaded navigation property
    public virtual Building Building { get; set; }
}

In this step, you need to make a navigation property virtual.

Step 3: Configure DB context

using Microsoft.EntityFrameworkCore;

public class ApplicationDbContext: DbContext
{
    public DbSet<Building> Buildings { get; set; }
    public DbSet<Floor> Floors { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder
            .UseLazyLoadingProxies() // Enable Lazy Loading
            .UseSqlServer("YourConnectionString");
    }
}

Step 4: use the context

var building = await context
    .Buildings
    .FirstOrDefaultAsync();
Console.WriteLine($"Building: {building.Name}"); // Floors will be loaded only when accessed 

foreach (var floor in building.Floors) 
{ 
    Console.WriteLine($" - Floor: {floor.Name}");	
}

Output

Building: Main Building
- Floor: Ground Floor
- Floor: First Floor

Eager loading 

Step 1: define models without any virtual keywords

public class Building
{
    public int Id { get; set; }
    public string Name { get; set; }

    // Navigation property (Not virtual)
    public ICollection<Floor> Floors { get; set; }
}

public class Floor
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int BuildingId { get; set; }

    public Building Building { get; set; }
}

Step 2: Configure DB context

public class ApplicationDbContext : DbContext
{
    public DbSet<Building> Buildings { get; set; }
    public DbSet<Floor> Floors { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlServer("YourConnectionStringHere");
    }
}

Step 3: use context with include to fetch the required tables 

// Eager loading with Include() 
var buildings = await context
    .Buildings
    .Include(b => b.Floors) 
    .ToListAsync(); 

foreach (var building in buildings) 
{
    Console.WriteLine($"Building: {building.Name}"); 
    foreach (var floor in building.Floors) 
    {
        Console.WriteLine($" - Floor: {floor.Name}");
    }
}

You can leverage loading types by understanding your business logic and using the right approach. The rule of thumb is to use Lazy loading when related data is needed occasionally and shouldn't always be loaded or some navigating tables are not required always. It has multiple queries (N+1 problem). While opting for Eager loading when related data is always needed and should be loaded efficiently. It runs a single optimized query.

Tip 4: Use indexes for filtering column

Indexes work like a table of contents in a book. Instead of scanning the entire table row by row, the database quickly jumps to relevant data. EFCore allows indexes on columns for fast retrieval. While the primary key column is automatically indexed in EFCore.

Step 1: Set the index in the ApplicationDbContext

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder
        .Entity<Building>()
        .HasIndex(b => b.Name) // Index on Building Name as it is frequently filtered
        .HasDatabaseName("IX_Building_Name");
}

Step 2: Use indexed property

var building = await context
    .Buildings
    .Where(b => b.Name == "Empire State")
    .FirstOrDefaultAsync();

The query is

SELECT TOP(1) * FROM Buildings WHERE Name = 'Empire State';

Inefficient usage of the index is 

var buildings = await context
    .Buildings
    .Where(b => b.Name.Contains("State"))
    .ToListAsync();

The Contains method does not use indexes and forces a full table scan because it searches anywhere in the string.

We can use the Like function that leverages the index

var buildings = await context
    .Buildings
    .Where(b => EF.Functions.Like(b.Name, "State%"))
    .ToListAsync();

Tip 5: Apply pagination in data fetching

Tables can be as large as millions of records in your application. And you rarely need all those records at once. Any mobile, web, or desktop application has a limited screen size where small chunks of data can be displayed usually less than 50. Fetching all data at once is quite inefficient and burdens the database. You can leverage EFcore's Skip and Take operators (OFFSET and LIMIT in SQL) to fetch only a window of data rather than taking all the data unnecessarily.  

Let's check how you can do it.

public async Task<List<Building>> GetBuildingsAsync(int pageSize = 20, int pageNo = 1)
{
    var buildings = await context
        .Buildings
        .Skip(pageSize * (pageNo - 1))
        .Take(pageSize)
        .ToListAsync()
}

By pagination only requested rows will be fetched and pageNo can shift to get the next records. The default values again ensure that the query fetches a small number of records.

Cartesian Explosion happens when a query unintentionally retrieves large associated data because of a table JOIN causing data duplication. Cartesian explosion results in performance costs on the database and the application.

For example the retrieval:

var buildings = context
    .Buildings
    .Include(b => b.Floors)
    .ToList();

Generates the query:

SELECT [b].[Id], [b].[Name], [f].[Id], [f].[Name], [f].[BuildingId]
FROM [Buildings] AS [b]
LEFT JOIN [Floors] AS [f] ON [b].[Id] = [f].[BuildingId]

That will fetch BuildingId and names multiple times unnecessarily. If a building has 5 floors then the building field will be fetched 5 times resulting in a cartesian explosion. You can imagine if your database is considerably large, how costly this problem can be on your user as well as pocket.

To avoid the problem use a project in a DTO model:

public class BuildingDto
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int FloorCount { get;set; }	
}

var buildingsWithFloors = context
    .Buildings
    .Select(b => 
        new BuildingDto
        {
            b.Id,
            b.Name,
            FloorCount = b.Floors.Count()
        })
    .ToList();

It generates the following query:

SELECT * FROM Buildings

Also, you can use the new AsSplitQuery() for separate queries instead of a single massive join:

var buildings = context
    .Buildings
    .Include(b => b.Floors)
    .AsSplitQuery()
    .ToList();

Behind the scenes it generates the following query:

SELECT * FROM Floors WHERE BuildingId IN (...)

Tip 7: Using SQL queries

EFCore translates LINQ queries into SQL queries but in some cases, the translation is not optimal or may result in inaccurate results. Consider using raw SQL which is relatively faster and offers you more control. EFCore's FromSqlRaw method allows you to run SQL queries directly. Especially in complex operations such as dashboard metrics and data with complex filtration, you can use SQL queries directly. It can be more readable and maintainable than nested where clauses.

Let us look at an example where we need to concatenate all the floor names along with building stats.

Step 1: Define model

[NotMapped]
public class BuildingStatistics
{
    public int BuildingId { get; set; }
    public string BuildingName { get; set; }
    public int TotalFloors { get; set; }
    public string FloorNames { get; set; }  // Comma-separated list of floor names
}

Step 2: configure in dbContext

public DbSet<BuildingStatistics> BuildingStatistics { get; set; }

Step 3: Prepare and execute the SQL query

var buildingStats = await context
    .BuildingStatistics
    .FromSqlRaw(@"
        SELECT 
            b.Id AS BuildingId, 
            b.Name AS BuildingName, 
            COUNT(f.Id) AS TotalFloors, 
            STRING_AGG(f.Name, ', ') AS FloorNames
        FROM Buildings AS b
        LEFT JOIN Floors AS f ON b.Id = f.buildingId
        GROUP BY b.Id, b.Name
        ORDER BY b.Name;")
    .ToListAsync();

Output

Output

Tip 8: Understand Buffering and Streaming

Buffering refers to the default behavior of EFCore, where all the fetched results are stored in the memory. Buffering enables you to use in-memory data once it is fetched. 

var buildings = await context.Buildings.ToListAsync();  // Loads all data into memory
foreach (var building in buildings)
{
    // Perform operations on the full data set
}

The above query loads all the records in the memory, and you can work with that result. It provides fast operations as EFCore only requires one call to the database. 

It is done by streaming processed records one by one, or in chunks, as the query is executed, without loading all data into memory.

var buildings = context.Buildings.AsEnumerable(); // Use AsEnumerable to enable streaming

await foreach (var building in buildings) // Streaming each record as it is read
{
    // Process each building here
}

As both buffering and streaming use memory in different ways, they are useful in their arena. If you have a small dataset, buffering can offer fast and optimized operations. While streaming is optimal when the query result is expected to be large enough, loading them into memory would cause performance issues. Each technique aces its requirements and you need to implement one based on your needs. Usually buffering with a paginated result is preferred due to its high speed and single round to the database. 

Conclusion 

Database operations are a big player in the performance of any project. Whether it is a web mobile or desktop application, Most of the task involves database interaction. Hence, handling such operations can significantly enhance the application's performance and user experience.

In .NET, EFCore is a widely used ORM. In this blog post, I presented some tips to optimize database interaction. Projecting only required columns and using AsNoTracking can reduce overhead. Besides, choosing between lazy and eager loading can be worth using. Using indexes can minimize database burden during filtration. We also discussed the utilization of pagination, avoidance of cartesian explosion, and leveraging raw SQL queries for the cause. Finally, we elaborated on how buffering and streaming can be fruitful in different scenarios. By following the above tips you can eliminate the additional load on the database and the memory to make your app more smooth and cost-efficient.

elmah.io: Error logging and Uptime Monitoring for your web apps

This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.

See how we can help you monitor your website for crashes Monitor your website