How to bulk insert with EF Core
You may have encountered a situation where your application must insert massive amounts of data at once. Scenarios such as analytics applications, IoT monitoring systems, legacy systems, and data migration critically require bulk insertion of records. Sometimes, you must deal with thousands of records simultaneously, so inserting them will significantly harm memory and other resource usage. In this post, I'll provide solutions for such bulk insertion using Entity Framework Core (EF Core).
Insert with Add
and SaveChangesAsync
This is the most naive way of inserting data. We iterate over a list of entities and insert them one by one.
Step 1: Prepare a list of entities to insert.
Create a collection (like a List
) of the entities you want to insert.
Step 2: Iterate over each element to add the one by one to the DbContext
.
This adds all the entities to EF Core's change tracker, marking them as new and ready for insertion.
Step 3: Call SaveChangesAsync
to commit the changes to the database.
This sends the insert commands for all the tracked entities to the database.
Let's jump into a full code sample. We have already installed the EF Core library and set up ApplicationDbContext
:
public class Order
{
public int Id { get; set; }
public string CustomerName { get; set; }
public DateTime OrderDate { get; set; }
public decimal TotalAmount { get; set; }
}
public async Task BulkInsertOrdersAsync()
{
// Step 1: Prepare a list of orders
var orders = new List<Order>
{
new Order { Id = 1, CustomerName = "Zack Crawley", OrderDate = DateTime.Now, TotalAmount = 200.50m },
new Order { Id = 2, CustomerName = "Ollie Pope", OrderDate = DateTime.Now.AddDays(-2), TotalAmount = 125.10m },
new Order { Id = 3, CustomerName = "Michel Johnson", OrderDate = DateTime.Now.AddDays(-1), TotalAmount = 234.50m },
new Order { Id = 4, CustomerName = "Emily Wills", OrderDate = DateTime.Now.AddDays(-4), TotalAmount = 355.25m },
new Order { Id = 5, CustomerName = "Joe Root", OrderDate = DateTime.Now.AddDays(-5), TotalAmount = 89.00m }
};
//Step 2: inserting records one by one
foreach(var order in orders)
{
_context.Orders.Add(order);
}
// Step 3: Save changes to the database
await _context.SaveChangesAsync();
}
Advantages
- Naive way insert and built-in with EF Core. It leverages EF Core automatic track changes.
Disadvantages
- Under the hood, it inserts one record at a time, which is slower for large datasets. This is the most expensive way when the database gets slightly bigger.
Using AddRange
with SaveChangesAsync
This is the simplest method of bulk insertion. It inserts the whole list using DbSet
's built-in AddRange
method.
Step 1: Prepare a list of entities to insert.
Create a collection (like a List
) of the entities you want to insert.
Step 2: Use AddRange
to add the entities to the DbContext
.
This adds all the entities to EF Core's change tracker, marking them as new and ready for insertion.
Step 3: Call SaveChangesAsync
to commit the changes to the database.
This sends the insert commands for all the tracked entities to the database.
Here's a simple example showing the three steps:
public async Task BulkInsertOrdersAsync()
{
// Step 1: Prepare a list of orders
// ...
// Step 2: Add the orders to the DbContext Orders
await _context.Orders.AddRangeAsync(orders);
// Step 3: Save changes to the database
await _context.SaveChangesAsync();
}
Advantages
- The easiest way to bulk insert is built-in with EF Core. It uses EF Core track changes and is better than the previous method.
Disadvantages
- Inserts one record at a time under the hood, which is slower for large datasets. Performs multiple round-trips to the database and track overhead, resulting in high memory usage.
Using BulkInsertAsync
from EFCore.BulkExtensions
To use BulkInsertAsync
you need to install the EFCore.BulkExtensions
NuGet package that provides an implementation of bulk insertion with a reduced number of SQL insert statements.
Step 1: Install the EFCore.BulkExtensions
NuGet package.
Run this command:
dotnet add package EFCore.BulkExtensions
Step 2: Prepare a list of entities to insert.
Create a collection of orders, same as previous ones.
Step 3: call BulkInsertAsync
to insert the entities.
This method performs the bulk insert operation using a single batch of SQL commands.
Here's a code sample containing steps 2 and 3:
public async Task BulkInsertOrdersAsync()
{
// Step 2: Prepare a list of orders
// ...
// Step 3: Use BulkInsertAsync to insert the entities in bulk
await _context.BulkInsertAsync(entities);
}
Advantages
- This library is designed for high performance and is extremely fast for large datasets. The library's abstract implementation makes code readable.
BulkExtension
uses native database bulk insert commands that significantly improve performance. This is much faster than EF Core's built-in methods.
Disadvantages
- Requires a third-party library that is not always desirable for certain projects. The code is abstract, so you have limited options and control over SQL queries.
Using ExecuteSqlRaw
for Raw SQL Insert
Another solution is to insert bulk using raw SQL with EF Core's ExecuteSqlRawAsync
method.
Step 1: Prepare a list of entities or data to insert.
Create your data manually or extract it from entities.
Step 2: Construct the SQL insert command.
Manually write the SQL query to insert your data. You can write parameterized queries for security as recommended.
Step 3: Use ExecuteSqlRawAsync
to execute the query.
Run the raw SQL command using EF Core.
With ExecuteSqlRawAsync
, you have two ways. The first one is to insert them one by one by calling the method each time:
// Step 2: Construct the SQL command and parameters for each order
foreach (var order in orders)
{
var sql = "INSERT INTO Orders (Id, CustomerName, OrderDate, TotalAmount) VALUES (@p0, @p1, @p2, @p3)";
// Step 3: Execute the SQL command for each order
await _context.Database.ExecuteSqlRawAsync(sql, order.Id, order.CustomerName, order.OrderDate, order.TotalAmount);
}
Or, you can insert all records at once:
// Step 2: Construct one SQL command with multiple values
var sql = "INSERT INTO Orders (Id, CustomerName, OrderDate, TotalAmount) VALUES ";
// Step 3: Dynamically add the values for each order
var parameters = new List<object>();
for (int i = 0; i < orders.Count; i++)
{
var order = orders[i];
sql += $"(@p{i * 4}, @p{i * 4 + 1}, @p{i * 4 + 2}, @p{i * 4 + 3}),";
parameters.Add(order.Id);
parameters.Add(order.CustomerName);
parameters.Add(order.OrderDate);
parameters.Add(order.TotalAmount);
}
// Remove the last comma from the SQL command
sql = sql.TrimEnd(',');
// Step 4: Execute the SQL command with all parameters
await _context.Database.ExecuteSqlRawAsync(sql, parameters.ToArray());
Advantages
- It gives you more control over the SQL and allows you to write complex queries. It is also faster by bypassing EF Core's change tracking.
Disadvantages
- It requires manual SQL and is harder to maintain for large or complex inserts. Unlike its counterpart, EF Core LINQ, which uses Object Relations, SQL raw queries are prone to error.
Best practices and considerations for bulk inserts in EF Core
You can use either of the ways discussed above as per your need. However, keeping the following best practices benefits your application.
Keep batch sizes reasonable: Divide your insertion into batches reasonably to avoid locking issues. A batch usually contains 1,000 to 10,000 records.
Add transaction management: Add transactions to your application to ensure data consistency. Especially in bulk insertion, any execution stops will leave previously added records inconsistent.
Use database indexes wisely: The database uses an index to optimize read operations, but it affects insertion. So, disable or drop unnecessary indexes when inserting.
Apply error handling: Offer proper logging and exception strategies for handling errors. This is generally recommended but important when performing bulk inserts to accurately identify erroneous data among the bulk.
Use data validation: Add appropriate data validation while entering the data to avoid any errors during the operations. You can add data format validation or transform data as desired. Such validation is often helpful when inserting data from files like CSV or TXT.
Handle concurrency concerns in highly transactional systems: You should implement a good concurrency system when working with transactional systems such as banking applications. This system avoids concurrent insertion or updation of the same data, which is problematic in such applications. You should also consider it in bulk insertion to avoid any anomaly.
Conclusion
In several cases, inserting large amounts of data is needed. You may need to insert thousands of records by optimizing performance and resource usage. Going with one-by-one insertion is definitely a nightmare for everything you consider important to your application, such as execution time, memory usage, and processing. In this post, I discussed several ways for bulk insertion data using Entity Framework Core.
elmah.io: Error logging and Uptime Monitoring for your web apps
This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.
See how we can help you monitor your website for crashes Monitor your website