Bulk download from Azure Blob Storage with C#
I had to write some C# downloading multiple blobs from Azure Blob Storage using the Azure.Storage.Blobs
NuGet package. To my surprise, no bulk option exists (at least not to my knowledge). Here's a quick summary of how I somewhat achieved this.
As mentioned already, there are no methods to download multiple blobs from Azure Blob Storage using the Azure.Storage.Blobs
NuGet package. There's a nice paging API and you can both get all blobs from a container and all blobs with a specified prefix. Common for these methods is that you only fetch metadata about the blobs. Let me illustrate with a simple example to fetch all blobs from a container:
var containerClient = blobServiceClient.GetBlobContainerClient("mycontainer");
var blobs = containerClient.GetBlobs();
var results = new List<string>();
foreach (var blob in blobs)
{
var blobClient = containerClient.GetBlobClient(blob.Blob.Name);
using var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
stream.Position = 0;
using var streamReader = new StreamReader(stream);
var result = await streamReader.ReadToEndAsync();
results.Add(result);
}
The GetBlobs
method returns a list of blob metadata that can be sequentially downloaded using the GetBlobClient
method and some streaming magic. As you already know, downloading blobs like this may end up taking a lot of time. Downloading multiple blobs is piece of cake using C#'s Task.WhenAll
:
var containerClient = blobServiceClient.GetBlobContainerClient("mycontainer");
var blobs = containerClient.GetBlobs();
var results = new List<string>();
var semaphore = new SemaphoreSlim(50);
var tasks = new List<Task>();
foreach (var blob in blobs)
{
await semaphore.WaitAsync();
tasks.Add(Task.Run(async () =>
{
try
{
var blobClient = containerClient.GetBlobClient(blob.Blob.Name);
using var stream = new MemoryStream();
await blobClient.DownloadToAsync(stream);
stream.Position = 0;
using var streamReader = new StreamReader(stream);
var result = await streamReader.ReadToEndAsync();
results.Add(result);
}
finally
{
semaphore.Release();
}
}));
}
await Task.WhenAll(tasks);
The code uses the Task.WhenAll
method to run multiple download tasks in parallel. In my case, I will never need to download more than 30 blobs, why I can add parallel downloading without having to worry about paging, the number of threads, and similar issues. In case you need to download a lot of blobs, check out the AsPages
method in the Azure.Pageable
class.
That's it. Blobs are now downloaded in parallel. The code is valid for scenarios where you need to download blobs from within a .NET program only. In case you need to download or upload files from the file system, there's a range of different tools to help you. On the command line, I prefer AzCopy and for a Windows app, I'm using Azure Storage Explorer.
elmah.io: Error logging and Uptime Monitoring for your web apps
This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.
See how we can help you monitor your website for crashes Monitor your website