How I Reduced API Response Time by 70% in a Large .NET Project

Slow APIs hurt users' experience and the owners' pocket. Users become disappointed, and the server costs more if APIs are not up to standard in large projects. I also faced a similar issue in my project. However, I cut down the performance overhead marginally. This article breaks down the exact steps I took, one optimization at a time.

What is API response time and latency?

API response time indicates the time an API takes to process a request and send a response. Latency captures the time it takes for data to travel between the client and the server.

API response time and latency are key metrics that define the efficiency and user-friendliness of your application. While response time is measured in seconds, latency is typically recorded in milliseconds.

A high-level representation of Latency, Processing Time and Response Time

How to reduce API response time in a .NET application

To illustrate practical tips, I will use an ASP.NET Core API project with a PostgreSQL database. To make the post relevant to a larger audience, I will use Entity Framework Core (EF Core), a popular ORM for database operations. Fast forward, we have one table with 10000 data points inserted and a simple endpoint. We will observe the improvement in response time after each tip.

The overview of the data looks like this:

Here's the controller with a simple GET method that returns all data:

using System.Diagnostics;
using Microsoft.AspNetCore.Mvc;
using Microsoft.EntityFrameworkCore;
using SocialApi.Data;

[ApiController]
[Route("api/[controller]")]
public class PostsController : ControllerBase
{
    private readonly AppDbContext _context;

    public PostsController(AppDbContext context)
    {
        _context = context;
        _cache = cache;
    }

    // Baseline: No optimization
    [HttpGet("baseline")]
    public async Task<IActionResult> GetAllBaseline()
    {
        var sw = Stopwatch.StartNew();
        var result = await _context.Posts.ToListAsync();
        sw.Stop();
        return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count         });
    }
}

The end result in Swagger UI is a new endpoint we can call:

Let's go ahead and implement improvements one by one.

Tip 1: Use AsNoTracking()

[HttpGet("as-notracking")]
public async Task<IActionResult> GetAll_AsNoTracking()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts.AsNoTracking().ToListAsync();
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Output

AsNoTracking disables EF Core's ChangeTracker to reduce response time up to 30%. The ChangeTracker tracks if any changes are made to the entities, and upon SaveChanges, it reflects those changes to the database. However, for read-only queries, you can reduce tracking overhead on CPU and memory with AsNoTracking,.

Tip 2: Project only the necessary fields

[HttpGet("projection")]
public async Task<IActionResult> GetAll_Projection()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .AsNoTracking()
        .Select(p => new { p.Id, p.Content })
        .ToListAsync();
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Output

Projection can reduce both response time and data size over the network. If an endpoint requires specific columns, fetch only those columns instead of the entire table. With projection across your application, you can reduce query response time and network payload size, and improve performance by 30-70%.

Tip 3: Paginate the data

[HttpGet("pagination")]
public async Task<IActionResult> GetAll_Pagination(int page = 1, int size = 100)
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .AsNoTracking()
        .Skip((page - 1) * size)
        .Take(size)
        .ToListAsync();
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Output

Another logical step you can take is pagination. Hardly any mobile app or web application will need all the thousands of data points in one fetch to display on any page. So, leveraging this information, you should only return a batch of records as requested. Definitely, it will be multiple times faster and more cost-efficient to retrieve a handful of records at one time rather than fetching thousands of records that nobody needs.

Tip 4: Cache response using MemoryCache

[HttpGet("cached")]
public async Task<IActionResult> GetAll_Cached()
{
    var sw = Stopwatch.StartNew();

    var result = await _cache.GetOrCreateAsync("posts", async entry =>
    {
        entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(2);
        return await _context.Posts.AsNoTracking().ToListAsync();
    });

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result?.Count });
}

The code adds a cache in front of EF Core that caches results on the first call. entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(2) sets the cached data to expire after 2 minutes. The code for fetching the results from the database is run only if the cache is empty.

Output

In the first call, it took some time, but in subsequent calls, it took 0 milliseconds as data is fetched from the server cache.

Introduce multiple tables in the database.

Before moving to the next tips, we need to include more tables to test out joins. I have added the Comment model:

public class Comment
{
    public int Id { get; set; }
    public int PostId { get; set; }
    public string Text { get; set; } = string.Empty;
    public string Commentator { get; set; } = string.Empty;
    public DateTime CreatedAt { get; set; }
    
    public virtual Post Post { get; set; }
}

// DbContext
        
public class AppDbContext: DbContext
{
    public AppDbContext(DbContextOptions<AppDbContext> options) : base(options) { }

    public DbSet<Post> Posts => Set<Post>();
    public DbSet<Comment> Comments => Set<Comment>();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Post>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("Post_pkey");

            //entity.ToTable("Post");

            entity.Property(e => e.Id).ValueGeneratedOnAdd();

        });

        modelBuilder.Entity<Comment>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("Comment_pkey");

            //entity.ToTable("Comment");

            entity.Property(e => e.Id).ValueGeneratedOnAdd();

            modelBuilder.Entity<Comment>()
                .HasOne(x => x.Post)
                .WithMany(x => x.Comments)
                .HasForeignKey(x => x.PostId)
                .OnDelete(DeleteBehavior.Restrict); 
        });
    }
}

The comments data is as follows.

Tip 5: Use split a for complex includes

Let's try a simple join with Include:

[HttpGet("baseline")]
public async Task<IActionResult> GetAllBaseline()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .Include(p => p.Comments)
        .ToListAsync();
    
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

By default, EF Core generates a single query for includes (JOINs), which can lead to a Cartesian explosion. With Split Query:

[HttpGet("split-queries")]
public async Task<IActionResult> GetAll_SplitQueries()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .Include(p => p.Comments)
        .AsSplitQuery()
        .ToListAsync();
    
    sw.Stop();
    if (result == null || result.Count == 0)
        return NoContent();

    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

You can observe that there is a considerable difference in the response time of both approaches. Also, the latter technique makes data less prone to Cartesian explosion.

Tip 6: Use compiled queries for frequently executed queries

The next step you can take is to use a compiled query. It improves performance by compiling the LINQ once and reusing it. It avoids parsing and fetching the same query repeatedly and is best suited for read-heavy applications.

First, compile a query with context, page, and size parameters into the following function:

private static readonly Func<AppDbContext, int, int, IAsyncEnumerable<Post>> _compiledQuery =
      EF.CompileAsyncQuery((AppDbContext context, int page, int size) =>
          context.Posts
              .AsNoTracking()
              .Skip((page - 1) * size)
              .Take(size));

[HttpGet("compiled")]
public async Task<IActionResult> GetAll_Compiled(int page = 1, int size = 100)
{
    var sw = Stopwatch.StartNew();
        var result = new List<Post>();

        await foreach (var post in _compiledQuery(_context, page, size))
        {
            result.Add(post);
        }

        sw.Stop();
        if (result == null || result.Count == 0)
            return NoContent();

        return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Output

Tip 7: Use AsNoTrackingWithIdentityResolution (when needed)

[HttpGet("identityresolution")]
public async Task<IActionResult> GetAll_WithIdentityResolution()
{
    var sw = Stopwatch.StartNew();

    var result = await _context.Posts
        .AsNoTrackingWithIdentityResolution()
        .Include(p => p.Comments) // if you have a related entity
        .Take(100)
        .ToListAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

AsNoTrackingWithIdentityResolution skips change tracking but still ensures each entity is only created once. Use it to avoid duplicates without the full overhead of tracking.

Tip 8: Avoid N+1 Queries

In our example, we want to fetch comments along with posts, so how can it be a pitfall in an N+1 situation? (Use Include for Relationships)

If you need to fetch related data from multiple tables, avoid N+1 queries. In our example, we want to fetch comments along with posts, so how can it be a pitfall in an N+1 situation?

[HttpGet("NPlusOne")]
public async Task<IActionResult> GetAll_WithNPlusOne()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .Take(200)
        .ToListAsync();
    
    foreach (var post in result)
    {
        var comments = await _context.Comments
            .Where(c => c.PostId == post.Id)
            .ToListAsync();
    }
    
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

But with include

[HttpGet("WithInclude")]
public async Task<IActionResult> GetAll_WithComments()
{
    var sw = Stopwatch.StartNew();

    var result = await _context.Posts
        .Include(p => p.Comments)
        .Take(200)
        .AsNoTracking()
        .ToListAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Including related entities in one go drastically reduced the response time. The main reason behind this difference is that the prior method hits the database N+1 times, which not only takes time but also affects the database by hitting it multiple times.

Tip 9: Avoid loading data if you don't need it

If your app requires getting the record count, then avoid fetching records before counting. An inefficient method of doing so is shown below:

[HttpGet("getAndCount")]
public async Task<IActionResult> GetAll_Count()
{
    var sw = Stopwatch.StartNew();
    var posts = await _context.Posts.ToListAsync();
    var count = posts.Count;

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = count });
}

It loads all the records in memory, unnecessarily wasting both memory and time. Instead, do the following:

[HttpGet("getCountAsync")]
public async Task<IActionResult> GetCountAsync()
{
    var sw = Stopwatch.StartNew();
    var count = await _context.Posts.CountAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = count });
}

Tip 10: Use `ValueTask` In Hot Paths (when applicable)

Another performance optimization option for returning a cached response is ValueTask:

private ValueTask<int> GetPostCountAsync() =>
    new ValueTask<int>(_context.Posts.Count());

[HttpGet("valuetask")]
public async ValueTask<IActionResult> GetCount_ValueTask()
{
    var sw = Stopwatch.StartNew();

    var count = await GetPostCountAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = count });
}

ValueTask struct leverages in tight looping scenarios and memory-constrained environments, such as mobile and game development, by minimizing GC pressures. Do not use it when the result is always asynchronous or when using public API calls.

Tip 11: Use `CancellationToken` to cancel long queries

ASP.NET Core automatically provides a CancellationToken in your API methods. If the client disconnects or cancels the request, the token is triggered, halting all ongoing processes of the request. CancellationToken let your API stop any ongoing operations, such as database queries, freeing up CPU and memory for already-cancelled requests:

[HttpGet("cancelable")]
public async Task<IActionResult> GetAll_Cancelable(CancellationToken token)
{
    var sw = Stopwatch.StartNew();

    var result = await _context.Posts
        .AsNoTracking()
        .ToListAsync(token); // respects cancellation

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result.Count });
}

Tip 12: Reduce DB round-trips with batching

If you need multiple data rows from an application's entities, reduce round-trips with batching. An inefficient way is shown here:

[HttpGet("mutiple-db-round")]
public async Task<IActionResult> GetAll_MultipleRound()
{
    var sw = Stopwatch.StartNew();
    var posts = await _context.Posts.Take(500).ToListAsync();
    var count = await _context.Posts.CountAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = count, Posts = posts });
}

Apply batching and fetch records in one go:

[HttpGet("batched")]
public async Task<IActionResult> GetAll_Batched()
{
    var sw = Stopwatch.StartNew();

    var result = await _context.Posts
        .GroupBy(p => 1)
        .Select(g => new
        {
            Count = g.Count(),
            Posts = g
                .Take(500).ToList()
        })
        .FirstOrDefaultAsync();

    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Count = result?.Count, Posts = result?.Posts });
}

batched endpoint

Tip 13: Use caching

Caching is a reliable option to optimize your application's response time. Caching allows applications to avoid expensive API calls and database queries repeatedly and significantly improves the user experience. We already looked at implementing a manual cache in a previous tip, but there are multiple ways of implementing this. Here is the guide on how to implement caching in your .NET application: Caching Strategies in ASP.NET Core.

Tip 14: Use streaming (`IAsyncEnumerable`) for large datasets

For larger datasets, use streaming to return the result rather than loading the entire dataset into memory. The following code will return all data in one large response:

[HttpGet("load-all")]
public async Task<IActionResult> LoadAll()
{
    var sw = Stopwatch.StartNew();
    var result = await _context.Posts
        .Take(200).AsNoTracking().ToListAsync();
    sw.Stop();
    return Ok(new { DurationMs = sw.ElapsedMilliseconds, Posts = result });
}

Streaming can be implemented using the AsAsyncEnumerable method and by changing the return type to IAsyncEnumerable<T>:

[HttpGet("stream")]
public async IAsyncEnumerable<Post> StreamPosts()
{
    await foreach (var post in _context.Posts
                       .Take(200)
                       .AsNoTracking().AsAsyncEnumerable())
    {
        yield return post;
    }
}

For more information about stream, check out the following blog post: IEnumerable vs. IAsyncEnumerable in .NET: Streaming vs. Buffering.

Tip 15: Divide your database into hot and cold storage.

Apart from efficient data fetching, you can opt for other options to enhance user experience and reduce response time. Planning your data into hot and cold storage can save your performance overhead and cost. If your application deals with real-time data or has historical data, you can strategically separate it into hot and cold storage. Keep frequently accessed and recent data in hot storage with a high-performance database for improved performance. Implement by putting the least-accessed data server with a lower configuration. Hence, any query made on hot data will not have to filter out or fetch from that large pool of data, eliminating additional overhead.

Tip 16: Use Benchmark to observe the application's performance

To closely monitor your system, implement benchmarking so you can timely notice any discrepancies and plan accordingly. NuGet has several tools that are trusted in that arena, such as BenchmarkDotNet, PerfView, and MiniProfiler. Check out How to Monitor Your App's Performance with .NET Benchmarking for more information.

Conclusion

Response time is a key factor in user experience. In fact, in many applications, it can be the core reason behind response lag. However, you can optimize it to make apps run faster. In this post, I have shared a few tips you can use to achieve better response time and high user satisfaction.

What is API response time and latency?

How to reduce API response time in a .NET application

Tip 1: Use AsNoTracking()

Tip 2: Project only the necessary fields

Tip 3: Paginate the data

Tip 4: Cache response using MemoryCache

Introduce multiple tables in the database.

Tip 5: Use split a for complex includes

Tip 6: Use compiled queries for frequently executed queries

Tip 7: Use AsNoTrackingWithIdentityResolution (when needed)

Tip 8: Avoid N+1 Queries

Tip 9: Avoid loading data if you don't need it

Tip 10: Use ValueTask In Hot Paths (when applicable)

Tip 11: Use CancellationToken to cancel long queries

Tip 12: Reduce DB round-trips with batching

Tip 13: Use caching

Tip 14: Use streaming (IAsyncEnumerable) for large datasets

Tip 15: Divide your database into hot and cold storage.

Tip 16: Use Benchmark to observe the application's performance

Conclusion

Tip 10: Use `ValueTask` In Hot Paths (when applicable)

Tip 11: Use `CancellationToken` to cancel long queries

Tip 14: Use streaming (`IAsyncEnumerable`) for large datasets