Designing high-throughput APIs for 1M requests/minute .NET
Scaling horizontally is great… until the bill comes. Hence, as a startup or a big firm, you always need to be careful about billing while maintaining competitiveness. Every user who lands on your application needs high throughput. In today's post, I will share ways to revolutionize the performance of your .NET application. We will observe the benchmark after applying those techniques to the response time.

High-throughput .NET API implementation
For the rest of this post, I'll use an ASP.NET Core Minimal API and measure performance using BenchmarkDotNet. Let's get started.
Step 1: Create the solution
dotnet new sln -n OrdersPerformanceDemo
cd OrdersPerformanceDemoStep 2: Add projects to the solution
dotnet new web -n Orders.Api
dotnet new console -n Orders.Benchmark
dotnet sln add Orders.Api
dotnet sln add Orders.BenchmarkStep 3: Install packages
cd Orders.Api
dotnet add package Microsoft.EntityFrameworkCore
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL
dotnet add package Dapper
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.Json
dotnet add package Microsoft.Extensions.Configuration.FileExtensionsAlso, in the benchmark project, we will add the required packages:
cd ../Orders.Benchmark
dotnet add package BenchmarkDotNet
dotnet add package Npgsql
dotnet add package Dapper
dotnet add package Microsoft.EntityFrameworkCore
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQLStep 4: Add Appsettings
In the appsettings.json file add a connection string:
"ConnectionStrings": {
"PostgresConnection": "Host=localhost;Port=5433;Database=shopDb;Username=postgres;Password=1234"
}And similarly, in Program.cs of the API project, include the following code:
var connectionString = builder.Configuration.GetConnectionString("PostgresConnection");
builder.Services.AddDbContext<ApplicationDbContext>(options =>
options.UseNpgsql(connectionString));
builder.Services.AddScoped<NpgsqlConnection>(_ =>
new NpgsqlConnection(connectionString));Step 5: Add models
Customer model.
namespace Orders.Api.Models;
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public string Country { get; set; }
}Order model.
namespace Orders.Api.Models;
public class Order
{
public int Id { get; set; }
public int? CustomerId { get; set; }
public decimal Amount { get; set; }
public string Status { get; set; }
public DateTime CreatedAt { get; set; }
public Customer Customer { get; set; }
}Step 6: Set up the DB context
using Microsoft.EntityFrameworkCore;
using Orders.Api.Models;
namespace Orders.Api.Data;
public class ApplicationDbContext: DbContext
{
public DbSet<Order> Orders => Set<Order>();
public DbSet<Customer> Customers => Set<Customer>();
public ApplicationDbContext(DbContextOptions<ApplicationDbContext> options)
: base(options) { }
}Step 7: Run migrations
dotnet ef migrations add InitialCreateAnd reflect on it.
dotnet ef database updateI already seeded a few thousand rows of data in the table.

Step 8: Add naive API
app.MapGet("/orders/naive", async (ApplicationDbContext db) =>
{
var data = await db.Orders
.Include(o => o.Customer)
.Take(50000)
.ToListAsync();
return Results.Ok(data);
});This is one of the simplest and least efficient ways of fetching data as a worst-case threshold. Well, to save the huge pressure of a million records on the system and to save time, I took 50000 records.
Step 9: Create a dapper endpoint
app.MapGet("/orders/dapper", async (
int? customerId,
int page,
int pageSize,
NpgsqlConnection conn) =>
{
var sql = @"
SELECT o.""Id"", o.""Amount"", o.""CreatedAt"", c.""Name""
FROM ""Orders"" o
JOIN ""Customers"" c ON o.""CustomerId"" = c.""Id""
WHERE o.""Status"" = 'Completed'
AND (@customerId IS NULL OR o.""CustomerId"" = @customerId)
ORDER BY o.""CreatedAt"" DESC
LIMIT @pageSize OFFSET @offset";
var result = await conn.QueryAsync(sql, new
{
customerId,
pageSize,
offset = (page - 1) * pageSize
});
return Results.Ok(result);
});Now, we are ready to start implementing optimizations.
Database indexing
CREATE INDEX IF NOT EXISTS idx_completed_orders
ON public."Orders" USING btree
("CustomerId" ASC NULLS LAST)
TABLESPACE pg_default
WHERE "Status" = 'Completed'::text;At the time of fetching, use the index to filter fast.
var query = db.Orders
.AsNoTracking()
.Where(o => o.Status == "Completed");
Use pagination
Pagination is a win-win situation in database fetching. In most cases, APIs do not need to return large amounts of data in the response. Rather, there is a limit, usually 20 to 100, that mobile and web applications can display at once. Leverage this fact and add pagination to the fetches, significantly reducing the performance penalties. You only get what you need and do not burden yourself with extra.
var result = await query
.OrderByDescending(o => o.CreatedAt)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new
{
o.Id,
o.Amount,
o.CreatedAt,
CustomerName = o.Customer.Name
})
.ToListAsync();Add Pagination Limits
To prevent users from fetching large amounts of data at once, add a pagesize limit. When you are sure about the API's use cases.
pageSize = Math.Min(pageSize, 100);Use Dapper for Hot Endpoints Only
app.MapGet("/orders/dapper", async (
int? customerId,
int page,
int pageSize,
NpgsqlConnection conn) =>
{
var sql = @"
SELECT o.""Id"", o.""Amount"", o.""CreatedAt"", c.""Name""
FROM ""Orders"" o
JOIN ""Customers"" c ON o.""CustomerId"" = c.""Id""
WHERE o.""Status"" = 'Completed'
AND (@customerId IS NULL OR o.""CustomerId"" = @customerId)
ORDER BY o.""CreatedAt"" DESC
LIMIT @pageSize OFFSET @offset";
var result = await conn.QueryAsync(sql, new
{
customerId,
pageSize,
offset = (page - 1) * pageSize
});
return Results.Ok(result);
});Dapper uses a query that is less abstracted than EF Core. This offers a performance edge, and you can use Dapper for high traffic operations.
Response Compression
builder.Services.AddResponseCompression();
app.UseResponseCompression();It compresses server responses before sending them to clients, reducing bandwidth usage and improving load times.
Avoid Over-Fetching
EF Core entities contain several navigation properties to other entities. In real cases, a single entity has a cluster of dependent entities, such as a User that has navigation with userRoles, languages, and approvals. Loading all of them can be a serious drain on memory and processing power. We have customers and an order table. However, if we need order data, then no need to involve other tables with code like this: .Include(o => o.Customer). If you need to send OrderId and Status over the API, then avoid including the customer unnecessarily.
No tracking for faster reads with EF Core
var query = db.Orders
.AsNoTracking()
.Where(o => o.Status == "Completed");
EF Core tracks fetched data for change tracking. To remove this behavior, use AsNoTracking when you only fetch data and don't need to modify it.
Projection to reduce response fields
Reduce the fetched field to only the required ones. Fetching and sending unnecessary fields costs CPU time and network bandwidth. Use projection to get what you need.
var result = await query
.OrderByDescending(o => o.CreatedAt)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new
{
o.Id,
o.Amount,
o.CreatedAt,
CustomerName = o.Customer.Name
})
.ToListAsync();Cache for frequently accessed data
Cache is another breakthrough in our journey. Caching frequently accessed data in memory reduces database round-trip. I will show two types of caching.
Output Cache
Output caching stores the whole web response in memory, the serialized JSON, headers, and status code next time you request the route with the same parameters. AddOutputCache registers output caching services in the DI container. UseOutputCache inserts the output-caching middleware into the request-processing pipeline. When a request comes in, this middleware intercepts it, checks if a valid cached response exists, and if so, returns it immediately. That way, the output caching reduces the whole cycle of hitting the database and serializing the response. In the program.cs inject the service.
builder.Services.AddOutputCache();app.UseOutputCache();The endpoint now looks like this:
app.MapGet("/orders/optimized_output", async (
int? customerId,
int page,
int pageSize,
ApplicationDbContext db,
IMemoryCache cache) =>
{
// Technique: Limit page size (protect API)
pageSize = Math.Min(pageSize, 100);
var query = db.Orders
.AsNoTracking() // Technique: No tracking (faster reads)
.Where(o => o.Status == "Completed"); //Technique: Uses partial index
if (customerId.HasValue)
query = query.Where(o => o.CustomerId == customerId); // Technique: Uses index
var result = await query
.OrderByDescending(o => o.CreatedAt) //Technique: Sorting (optimize via index)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new // Technique: Projection (avoid over-fetching)
{
o.Id,
o.Amount,
o.CreatedAt,
CustomerName = o.Customer.Name
})
.ToListAsync();
return Results.Ok(result);
})
.CacheOutput(p =>
p.Expire(TimeSpan.FromSeconds(600))); // Technique: Output caching
.CacheOutput contains a lambda expression that specifies the time to retain cached data.
In-memory cache
In most cases, output caching should be sufficient. But sometimes, you may want to look into creating manual, private caches in your code.
To start creating a cache, import the following namespace in the Program.cs file:
using Microsoft.Extensions.Caching.Memory;Inject caching:
builder.Services.AddMemoryCache();In our endpoint method, you inject a IMemoryCache object and set the cache key:
var cacheKey = $"orders:{customerId}:{page}:{pageSize}";upon every request. First, we check if the required data is in the in-memory cache. If found, we return it without going to the database.
if (cache.TryGetValue(cacheKey, out List<object> cached))
return Results.Ok(cached);Otherwise, set the fetched result in cache.
cache.Set(cacheKey, result, TimeSpan.FromSeconds(30));
The endpoint now looks like this:
app.MapGet("/orders/optimized_inmemory", async (
int? customerId,
int page,
int pageSize,
ApplicationDbContext db,
IMemoryCache cache) =>
{
// Technique: Limit page size (protect API)
pageSize = Math.Min(pageSize, 100);
// Technique: Cache key
var cacheKey = $"orders:{customerId}:{page}:{pageSize}";
// Technique: Memory cache (ultra fast)
if (cache.TryGetValue(cacheKey, out List<object> cached))
return Results.Ok(cached);
var query = db.Orders
.AsNoTracking() // Technique: No tracking (faster reads)
.Where(o => o.Status == "Completed"); //Technique: Uses partial index
if (customerId.HasValue)
query = query.Where(o => o.CustomerId == customerId); //Technique: Uses index
var result = await query
.OrderByDescending(o => o.CreatedAt) //Technique: Sorting (optimize via index)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new // Technique: Projection (avoid over-fetching)
{
o.Id,
o.Amount,
o.CreatedAt,
CustomerName = o.Customer.Name
})
.ToListAsync();
// Technique : Cache result
cache.Set(cacheKey, result, TimeSpan.FromSeconds(600));
return Results.Ok(result);
});Although I have used both output and memory caches in the app, the output cache is better suited to APIs, dashboards, and repetitive data. In-memory is optimal for expensive calculations, permissions, and non-HTTP scenarios.
Know the caching strategy in detail to better utilize it according to your application.
Step 10: Set up code in the Benchmark project
using BenchmarkDotNet.Attributes;
using Microsoft.EntityFrameworkCore;
using Dapper;
using Microsoft.Extensions.Configuration;
using Npgsql;
using Orders.Api.Data;
using Orders.Benchmark.Config;
namespace Orders.Benchmark;
[MemoryDiagnoser]
public class OrdersBenchmark
{
private HttpClient _client;
private string _baseUrl;
[GlobalSetup]
public void Setup()
{
var config = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json")
.Build();
_baseUrl = "http://localhost:5102";
_client = new HttpClient
{
BaseAddress = new Uri(_baseUrl)
};
}
[Benchmark]
public async Task Naive()
{
var response = await _client.GetAsync("/orders/naive");
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
}
[Benchmark]
public async Task OptimizedInmemory()
{
var response = await _client.GetAsync("/orders/optimized_inmemory?page=1&pageSize=50&customerId=223");
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
}
[Benchmark]
public async Task OptimizedOutput()
{
var response = await _client.GetAsync("/orders/optimized_output?page=1&pageSize=50&customerId=223");
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
}
[Benchmark]
public async Task Dapper()
{
var response = await _client.GetAsync("/orders/dapper?page=1&pageSize=50");
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
}
}We are setting up the client for the API project. Later, defined methods for each API endpoint.
Step 11: Call the benchmark in the Program.cs
using BenchmarkDotNet.Running;
using Orders.Benchmark;
BenchmarkRunner.Run<OrdersBenchmark>();Step 12: Run and test
One by one, running the project in different terminals.
cd Orders.Api
dotnet runAnd the benchmark in the release mode.
cd Orders.Benchmark
dotnet run -c ReleaseResult

Running again to check caching.

The results speak about output caching performance in both cases. After accessing cached data, the results are even faster.
External factors to achieve High-Throughput API
The greatest soldier with a weak sword cannot stand in the battle. Similarly, a very good code cannot perform well if it is not hosted properly. To achieve high throughput, you need to take care of the following external factors.
Choosing the Right infrastructure and hosting
When hosting your application, choose the right server with a few things in mind. Go with an option that offers more parallel request handling, high-performing RAM, and a location close to your users' geographic area. If your code takes 30 ms for the job but the server responds in 200 ms due to the user's geographic distance, it will be counted as slow. Besides, SSDs are faster than HDDs, so choose accordingly. High-speed storage has a significant impact on the request-response cycle.
Using a reverse Proxy / Load balancer
Load balancing is necessary for scalable systems. Never expose the API directly; instead, use HAProxy and NGINX for load balancing, SSL termination, and request buffering.
Background Processing
Tasks such as analytics, email, and logging should be separate from main API calls to avoid blocking the request-response pipeline.
Rate limit API
Rate limiting is a performance and precautionary measure that limits a single user or tenant from overusing or abusing the application. Overuse can spike the server, slowing APIs for other users. With .NET 10, rate limiting is even improved.
Conclusion
Every system needs to achieve optimal performance. In this regard, I shared a few important techniques that a .NET API can use to achieve high throughput. However, testing with 1 million requests is difficult on a personal computer. However, we observed remarkable success with all the tactics, and you can improve the application by following them as well. I also discuss external factors that contribute to user experience and system response time.
Code: https://github.com/elmahio-blog/OrdersPerformanceDemo.git
elmah.io: Error logging and Uptime Monitoring for your web apps
This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.
See how we can help you monitor your website for crashes Monitor your website