Working in the .Net world, you get pretty used to dealing with IEnumerable collections. However, you have to be aware of performance issues that can arise when using them with EntityFramework. Sometimes I forget about IQueryable because LINQ to Entities obfuscates much of the difference of retrieving objects from a database vs dealing with an in-memory collection and IQueryable is pretty specific to dealing with querying a database.

When using the Repository pattern, one of the things I love to do is add a flexible “Find” method to the repository. Below is an example:

public partial class Order
{
    public int ID { get; set; }
    public string CustomerItemNumber { get; set; }
    public DateTime? ShipDate { get; set; }
    public int OrderTypeID { get; set; }
}

public class FindOrdersReqeust
{
    public IEnumerable<int> OrderIDs { get;set; }
    public IEnumerable<string> CustomerOrderNumbers { get; set; }
    public IEnumerable<int> OrderTypeIDs { get; set; }
    public bool? HasShipDate { get; set; }
    public DateTime? ShipDateBefore { get; set; }
    public DateTime? ShipDateAfter { get; set; }
    
    public FindOrdersReqeust()
    {
        OrderIDs = new List<int>();
        CustomerOrderNumbers = new List<string>();
        OrderTypeIDs = new List<int>();
    }
}

//I would normally implement an interface here, but for the sake of brevity am excluding from this example
public class OrderRepository
{
    private IDbContext context;
    public OrderRepository(IDbContext context)
    {
        this.context = context;
    }

    public IEnumerable<Order> Find(FindOrdersRequest request)
    {
        IEnumerable<Order> orders = context.Set<Order>().AsEnumerable();
        if(request.OrderIDs.Any())
        {
            orders = orders.Where(o => request.OrderIDs.Contains(o.ID));
        }
        if(request.CustomerOrderNumbers.Any())
        {
            orders = orders.Where(o => request.CustomerOrderNumbers.Any(x => o.CustomerOrderNumber.Equals(x)));
        }
        if(request.OrderTypeIDs.Any())
        {
            orders = orders.Where(o => request.OrderTypeIDs.Contains(o.OrderTypeID));
        } 
        if(request.ShipDateAfter.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue && o.ShipDate >= request.ShipDateAfter);
        }
        if (request.ShipDateBefore.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue && o.ShipDate <= request.ShipDateBefore);
        }
        if(request.HasShipDate.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue == reqeust.HasShipDate.Value);
        }

        return orders;
    }
}

public class OrderService
{
    private OrderRepository orderRepository;
    public OrderService(OrderRepository orderRepository)
    {
        this.orderRepository = orderRepository;
    }

    public void GetUnshippedOrders()
    {
        return orderRepository.Find(new FindOrdersRequest()
        {
            HasShipDate = false
        }).ToList();
    }
}

The major problem with this code is highlighted in the example above – namely that AsEnumerable() call on the context.Set<T> will enumerate every row from the database in full. The Sql generated will be equivalent to a SELECT *, and will probably look something like:

SELECT [Extent1].[ID] AS [ID], 
    [Extent1].[CustomerOrderNumber] AS [CustomerOrderNumber], 
    [Extent1].[ShipDate] AS [ShipDate]
    FROM [dbo].[Orders] AS [Extent1]

Now, you might not notice if you have 10 rows, but if you have 1,000,000, you’ll notice as your application burns to the ground and consumes all the memory on whatever server it’s running on.

So, an easy fix is to change that one line to IQueryable / AsQueryable() like so:

IQueryable<Order> orders = context.Set<Order>().AsQueryable();

Now we get the benefits of deferred execution until ToList is called on the results of the Find method from OrderRespository. The SQL generated will now be something like:

SELECT [Extent1].[ID] AS [ID], 
    [Extent1].[CustomerOrderNumber] AS [CustomerOrderNumber], 
    [Extent1].[ShipDate] AS [ShipDate],
    [Extent1].[OrderTypeID] AS [OrderTypeID]
    FROM [dbo].[Orders] AS [Extent1]
    WHERE [Extent1].[ShipDate] IS NULL

This is a huge improvement already, but it can be much better than this. In the OrderRepository class, if we also make the return type IQueryable, we can then further query the database before pulling the results into memory.

public IQueryable<Orders> Find(FindOrdersRequest request)
{
    //the rest of the method remains the same
}

This distinction is important, and I will provide an example.

After I add this “Find” functionality to my repositories, I tend to build reports using those methods, which frequently utilize .GroupBy() after filtering. If we were to leave the Find method as returning an IEnumerable<Order> collection, we would find that the SQL generated would not be what we wanted.

For example, let’s say I now wanted a report that showed the number of shipped and unshipped orders by OrderTypeID. I would add a method to the OrderService as such:

//assume that an object ReportItem exists with properties as defined below
public IEnumerable<ReportItem> GetUnshippedOrdersByTypeReport()
{
    var report = new List<ReportItem>();
    var results = orderRepository.Find(new FindOrdersRequest(){ HasShipDate = false })
                                 .GroupBy(order => new 
                                 {
                                     OrderTypeID: order.OrderTypeID
                                 })
                                 .Select(g => new
                                 {
                                     OrderTypeID: g.Key.ID,
                                     ShippedOrderCount: g.Sum(x => x.ShipDate.HasValue),
                                     UnshippedOrderCount: g.Sum(x => !x.ShipDate.HasValue)   
                                 });
    foreach(var result in results)
    {
        report.Add(new ReportItem()
        {
            OrderTypeID: result.OrderTypeID,
            ShippedOrderCount: result.ShippedOrderCount,
            UnshippedOrderCount: result.UnshippedOrderCount,
        });
    }
   
    return report;
}

With this code, the benefits of using IQueryable in the repository are clear over IEnumerable.

If we leave the repository with IEnumerable, the SQL generated will be done in two phases:

  1. The filtering of the “FindOrdersRequest” will be executed as the SQL statement above and the results will be stored into a temporary IEnumerable collection
  2. The Group By operation will operate on this temporary collection. More trips to the database will be taken if any navigation properties are referenced in the GroupBy (there are none in this example)

If we change the repository to use IQueryable, the SQL generated will be done in a single, neat statement. It will filter and perform the group by at once, resulting in much better performance. Another performance benefit is that we are projecting specific columns and not populating an entire Order object with every field. For this trivial example, it doesn’t make much of a difference, but if you’re dealing with tables/objects that have many columns/properties, you will notice. If you’re deploying to a cloud-based environment, you know that compute time and efficiency matter a lot, so following best practices for performance will help you out a lot in that respect.