Bulk download from Azure Blob Storage with C#

I had to write some C# downloading multiple blobs from Azure Blob Storage using the Azure.Storage.Blobs NuGet package. To my surprise, no bulk option exists (at least not to my knowledge). Here's a quick summary of how I somewhat achieved this.

As mentioned already, there are no methods to download multiple blobs from Azure Blob Storage using the Azure.Storage.Blobs NuGet package. There's a nice paging API and you can both get all blobs from a container and all blobs with a specified prefix. Common for these methods is that you only fetch metadata about the blobs. Let me illustrate with a simple example to fetch all blobs from a container:

var containerClient = blobServiceClient.GetBlobContainerClient("mycontainer");
var blobs = containerClient.GetBlobs();

var results = new List<string>();

foreach (var blob in blobs)
{
    var blobClient = containerClient.GetBlobClient(blob.Blob.Name);
    using var stream = new MemoryStream();
    await blobClient.DownloadToAsync(stream);
    stream.Position = 0;
    using var streamReader = new StreamReader(stream);
    var result = await streamReader.ReadToEndAsync();
    results.Add(result);
}

Would your users appreciate fewer errors?

➡️ Reduce errors by 90% with elmah.io error logging and uptime monitoring ⬅️

The GetBlobs method returns a list of blob metadata that can be sequentially downloaded using the GetBlobClient method and some streaming magic. As you already know, downloading blobs like this may end up taking a lot of time. Downloading multiple blobs is piece of cake using C#'s Task.WhenAll:

var containerClient = blobServiceClient.GetBlobContainerClient("mycontainer");
var blobs = containerClient.GetBlobs();

var results = new List<string>();

var semaphore = new SemaphoreSlim(50);
var tasks = new List<Task>();

foreach (var blob in blobs)
{
    await semaphore.WaitAsync();

    tasks.Add(Task.Run(async () =>
    {
        try
        {
            var blobClient = containerClient.GetBlobClient(blob.Blob.Name);
            using var stream = new MemoryStream();
            await blobClient.DownloadToAsync(stream);
            stream.Position = 0;
            using var streamReader = new StreamReader(stream);
            var result = await streamReader.ReadToEndAsync();
            results.Add(result);
        }
        finally
        {
            semaphore.Release();
        }
    }));
}

await Task.WhenAll(tasks);

The code uses the Task.WhenAll method to run multiple download tasks in parallel. In my case, I will never need to download more than 30 blobs, why I can add parallel downloading without having to worry about paging, the number of threads, and similar issues. In case you need to download a lot of blobs, check out the AsPages method in the Azure.Pageable class.

That's it. Blobs are now downloaded in parallel. The code is valid for scenarios where you need to download blobs from within a .NET program only. In case you need to download or upload files from the file system, there's a range of different tools to help you. On the command line, I prefer AzCopy and for a Windows app, I'm using Azure Storage Explorer.