The Repository Pattern is simple yet misunderstood

This post will revisit the Repository Pattern, will address the implementation pitfalls in the .NET world, and will review the proposed solutions to avoid unnecessary complexity.

The Repository Pattern is simple yet misunderstood

What should Repositories do?

It's been 20 years since Martin Fowler documented the Repository Pattern in his book, "P of EAA". This pattern is probably the most discussed one, as it deals with a very common problem: how to persist and retrieve objects into and from a repository, without polluting the Domain with technical knowledge about the repository itself.

The Repository Pattern proposes a different view about persistence. It is a view from the objects' perspective, it speaks their language and it doesn't care about the Repository itself. An object is added or removed from an existing collection of objects. Furthermore, the collection itself can be iterated and a business-wise criterion can be applied, so only the interesting objects are presented. With such a simple perspective of the way we retrieve, add or remove Domain Objects, the Application Layer is resolving the Use Cases without having a dependency on any knowledge about the persistence technology, but acting as the objects are there, in memory.

For many years the first choice for the persistence storage was typically a relational database. The decisions were translated into database schema amendments, and less into designing data and behavior encapsulated by objects. As a consequence the domain objects reflected the database schema and the repositories the database operations:

interface ICustomersRepository
{
    void Create(CustomerEntity customer);
    
    CustomerEntity GetById(Guid customerId);
    
    List<CustomerEntity> GetAll();
    
    void Update(CustomerEntity customer);
    
    void Delete(Guid customerId);
}

Then the NoSql databases became more and more prevalent. The domain objects would be other things than Entities, they would have fewer restrictions dictated by the ORM implementations, they now resemble rich Documents. Sometimes the repositories would still leak some NoSql-related abstraction.

In either case, the Repositories did the opposite: instead of viewing the operations as applied to collections of objects, they were in fact direct instructions to initiate and commit database transactions or to execute HTTP requests.

How should the Repositories be defined?

No matter the persistence type, the repositories get smarter. Why should we have ICustomersRepository, ITransactionsRepository when we could have the Generic Repository? It's even very easy to write one as the implementation is delegated to ORMs or Client Libraries. However, the Application does not want smart repositories that can get anything or store anything into the persistence. What Application really wants is to have Repositories that provide minimal and meaningful methods and that are defined by the Application itself.

interface ICustomersRepository
{
    void Add(Customer customer);
    
    Customer GetById(Guid customerId);
    
    IReadOnlyCollection<Customer> GetGoldCustomers();
    
    int NumberOfCustomersForStore(Guid storeGuid);
}

Imagine a Payment Provider that has a requirement to preserve all Transactions. Typically if there would be a Generic Repository, then in order to ensure that no transaction is deleted, the solution is to inherit the BaseRepository class and override the behavior of the Delete method to either throw an Exception or even worse to do nothing. What if the transactions should be immutable? Override the Update method to add a new transaction? Is this an honest method? A more meaningful Repository would be:

interface ITransactionsRepository
{
    void Add(Transaction transaction);
    
    IReadOnlyCollection<Transaction> GetTransactionsForCustomer(Guid customerId);
}

The Application is interested only to see the transactions for that customer, so it completes some Business Use Case. It will retrieve an IReadOnlyCollection of these transactions because it doesn't need more or less. If it would be a List, then should the Application think that is it ok to Insert or to Add a transaction into that List and somehow the Transaction persists? This is not the case. Or if it would be an IEnumerable then the Application can only reason it has something to enumerate. The result it might not be there yet, and there is no guarantee that it will be safely retrieved.

But we are not paid to type

That's true and nothing stops us from using EntityFramework to build a Generic Data Access Object which can be re-used by the repository implementations. What we really need here is a simple way to define generic CRUD Methods:

interface IReadEntities
{
    T GetById<T>(Guid id);
    
    IQueryable<T> FindBy<T>(Func<T, bool> criteria);
}

interface IWriteEntities
{
    void Add<T>(T entity);}
}

Then all the Repository implementations would reuse these interfaces. Their implementations are actually delegated to EntityFramework.

public class TransactionsRepository : ITransactionsRepository
{
    private readonly IReadEntities _readEntities;

    public TransactionsRepository(IReadEntities readEntities, IWriteEntities writeEntities)
    {
        _readEntities = readEntities;
        _writeEntities = writeEntities;
    }

    public void Add(Transaction transaction)
    {
        var transactionEntity = transaction.ToEntity();
        writeEntities.Add(transactionEntity);
    }

    public IReadOnlyCollection<Transaction> GetTransactionsForCustomer(Guid customerId)
    {
        var transactions = readEntities
            .FindBy(t => t.CustomerId == customerId)
            .OrderByDescending(t => t.TransactionDate)
            .ToList();

        return transactions.Select(t => t.ToModel()).ToList();
    }
}

Notice there is no IReadEntities<T> or IWriteEntities<T>, there is no Generic Type, and no classes to inherit some Base Class. Only Generic Methods. It is easy to implement the Repositories as they will use a small number of arguments and not N arguments for every entity they would need. Simpler constructors also mean less mocking code, which could be a growing pain when the number of arguments increases.

Who saves the changes though?

When working with an in-memory collection of objects there is no extra step to call after the object is added to the collection. Adding an object to a collection does not require a SaveChanges call. A Business Use Case might involve several domain objects that could be modified, deleted, or created. It would be impractical to save the changes to the repository with every new one. For this particular reason, another pattern is used: UnitOfWork. The changes are tracked and saved by the UnitOfWork. Some designs coupled the Repositories with the UnitOfWork, by defining an interface like the one below:

public interface IUnitOfWork
{
    ICustomersRepository CustomersRepository {get; }

    ITransactionsRepository TransactionsRepository {get; }

    void Commit();
}

This is nothing more than a Repositories Locator. While it might be handy to inject and mock the IUnitOfWork, the main disadvantage is it hides which Repositories are used by the Use Case. An IUnitOfWork interface with a single Commit method is enough. When Entity Framework is used, the UnitOfWork and the Repositories will share the same context instance. For other kinds of repositories, a common object can track what Domain Objects are added, changed, or deleted. The UnitOfWork would traverse these lists and call the appropriate methods to materialize the changes.

Take aways:

  • The Repository Pattern is used to simplify the Application Layer and is defined by the Application Layer. The repositories evolve with the Business Use Cases.
  • The Repository handles domain objects and is agnostic of the technical details.
  • Generics can still be used to implement the repositories.
  • The UnitOfWork takes care of the actual changes. The UnitOfWork and the Repositories don't have to know about each other.
  • Not all Technical Solutions require the use of the Repository Pattern. If the requirements are mostly CRUD, it is more efficient to use the ORM or Client Libraries. Keep it simple.

elmah.io: Error logging and Uptime Monitoring for your web apps

This blog post is brought to you by elmah.io. elmah.io is error logging, uptime monitoring, deployment tracking, and service heartbeats for your .NET and JavaScript applications. Stop relying on your users to notify you when something is wrong or dig through hundreds of megabytes of log files spread across servers. With elmah.io, we store all of your log messages, notify you through popular channels like email, Slack, and Microsoft Teams, and help you fix errors fast.

See how we can help you monitor your website for crashes Monitor your website