EntityFramework Performance and IEnumerable vs IQueryable

Working in the .Net world, you get pretty used to dealing with IEnumerable collections. However, you have to be aware of performance issues that can arise when using them with EntityFramework. Sometimes I forget about IQueryable because LINQ to Entities obfuscates much of the difference of retrieving objects from a database vs dealing with an in-memory collection and IQueryable is pretty specific to dealing with querying a database.

When using the Repository pattern, one of the things I love to do is add a flexible “Find” method to the repository. Below is an example:

public partial class Order
{
    public int ID { get; set; }
    public string CustomerItemNumber { get; set; }
    public DateTime? ShipDate { get; set; }
    public int OrderTypeID { get; set; }
}

public class FindOrdersReqeust
{
    public IEnumerable<int> OrderIDs { get;set; }
    public IEnumerable<string> CustomerOrderNumbers { get; set; }
    public IEnumerable<int> OrderTypeIDs { get; set; }
    public bool? HasShipDate { get; set; }
    public DateTime? ShipDateBefore { get; set; }
    public DateTime? ShipDateAfter { get; set; }
    
    public FindOrdersReqeust()
    {
        OrderIDs = new List<int>();
        CustomerOrderNumbers = new List<string>();
        OrderTypeIDs = new List<int>();
    }
}

//I would normally implement an interface here, but for the sake of brevity am excluding from this example
public class OrderRepository
{
    private IDbContext context;
    public OrderRepository(IDbContext context)
    {
        this.context = context;
    }

    public IEnumerable<Order> Find(FindOrdersRequest request)
    {
        IEnumerable<Order> orders = context.Set<Order>().AsEnumerable();
        if(request.OrderIDs.Any())
        {
            orders = orders.Where(o => request.OrderIDs.Contains(o.ID));
        }
        if(request.CustomerOrderNumbers.Any())
        {
            orders = orders.Where(o => request.CustomerOrderNumbers.Any(x => o.CustomerOrderNumber.Equals(x)));
        }
        if(request.OrderTypeIDs.Any())
        {
            orders = orders.Where(o => request.OrderTypeIDs.Contains(o.OrderTypeID));
        } 
        if(request.ShipDateAfter.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue && o.ShipDate >= request.ShipDateAfter);
        }
        if (request.ShipDateBefore.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue && o.ShipDate <= request.ShipDateBefore);
        }
        if(request.HasShipDate.HasValue)
        {
            orders = orders.Where(o => o.ShipDate.HasValue == reqeust.HasShipDate.Value);
        }

        return orders;
    }
}

public class OrderService
{
    private OrderRepository orderRepository;
    public OrderService(OrderRepository orderRepository)
    {
        this.orderRepository = orderRepository;
    }

    public void GetUnshippedOrders()
    {
        return orderRepository.Find(new FindOrdersRequest()
        {
            HasShipDate = false
        }).ToList();
    }
}

The major problem with this code is highlighted in the example above – namely that AsEnumerable() call on the context.Set<T> will enumerate every row from the database in full. The Sql generated will be equivalent to a SELECT *, and will probably look something like:

SELECT [Extent1].[ID] AS [ID], 
    [Extent1].[CustomerOrderNumber] AS [CustomerOrderNumber], 
    [Extent1].[ShipDate] AS [ShipDate]
    FROM [dbo].[Orders] AS [Extent1]

Now, you might not notice if you have 10 rows, but if you have 1,000,000, you’ll notice as your application burns to the ground and consumes all the memory on whatever server it’s running on.

So, an easy fix is to change that one line to IQueryable / AsQueryable() like so:

IQueryable<Order> orders = context.Set<Order>().AsQueryable();

Now we get the benefits of deferred execution until ToList is called on the results of the Find method from OrderRespository. The SQL generated will now be something like:

SELECT [Extent1].[ID] AS [ID], 
    [Extent1].[CustomerOrderNumber] AS [CustomerOrderNumber], 
    [Extent1].[ShipDate] AS [ShipDate],
    [Extent1].[OrderTypeID] AS [OrderTypeID]
    FROM [dbo].[Orders] AS [Extent1]
    WHERE [Extent1].[ShipDate] IS NULL

This is a huge improvement already, but it can be much better than this. In the OrderRepository class, if we also make the return type IQueryable, we can then further query the database before pulling the results into memory.

public IQueryable<Orders> Find(FindOrdersRequest request)
{
    //the rest of the method remains the same
}

This distinction is important, and I will provide an example.

After I add this “Find” functionality to my repositories, I tend to build reports using those methods, which frequently utilize .GroupBy() after filtering. If we were to leave the Find method as returning an IEnumerable<Order> collection, we would find that the SQL generated would not be what we wanted.

For example, let’s say I now wanted a report that showed the number of shipped and unshipped orders by OrderTypeID. I would add a method to the OrderService as such:

//assume that an object ReportItem exists with properties as defined below
public IEnumerable<ReportItem> GetUnshippedOrdersByTypeReport()
{
    var report = new List<ReportItem>();
    var results = orderRepository.Find(new FindOrdersRequest(){ HasShipDate = false })
                                 .GroupBy(order => new 
                                 {
                                     OrderTypeID: order.OrderTypeID
                                 })
                                 .Select(g => new
                                 {
                                     OrderTypeID: g.Key.ID,
                                     ShippedOrderCount: g.Sum(x => x.ShipDate.HasValue),
                                     UnshippedOrderCount: g.Sum(x => !x.ShipDate.HasValue)   
                                 });
    foreach(var result in results)
    {
        report.Add(new ReportItem()
        {
            OrderTypeID: result.OrderTypeID,
            ShippedOrderCount: result.ShippedOrderCount,
            UnshippedOrderCount: result.UnshippedOrderCount,
        });
    }
   
    return report;
}

With this code, the benefits of using IQueryable in the repository are clear over IEnumerable.

If we leave the repository with IEnumerable, the SQL generated will be done in two phases:

  1. The filtering of the “FindOrdersRequest” will be executed as the SQL statement above and the results will be stored into a temporary IEnumerable collection
  2. The Group By operation will operate on this temporary collection. More trips to the database will be taken if any navigation properties are referenced in the GroupBy (there are none in this example)

If we change the repository to use IQueryable, the SQL generated will be done in a single, neat statement. It will filter and perform the group by at once, resulting in much better performance. Another performance benefit is that we are projecting specific columns and not populating an entire Order object with every field. For this trivial example, it doesn’t make much of a difference, but if you’re dealing with tables/objects that have many columns/properties, you will notice. If you’re deploying to a cloud-based environment, you know that compute time and efficiency matter a lot, so following best practices for performance will help you out a lot in that respect.

 

How to Add and Manage Outlook Rules/Filters for Office 365 Shared Mailboxes

It isn’t obvious, but you can setup and manage rules for shared mailboxes in Office 365 just as you do for users’ mailboxes. It isn’t obvious how to it is because you can’t administer these rules through the desktop client (or any other client) like you can with user mailboxes, and the settings for managing these rules doesn’t even appear to be available from the Office 365 or Exchange administration panel. Here are the steps to take:

  • Login to Office 365 with an account that has administrative access to the Shared Mailbox
  • Enter the following Url into your browser to get to the shared mailbox options page: https://outlook.office365.com/ecp/<email address>, where <email address> is the email address of the shared mailbox with the rules you want to manage.
    • For example, if your email address was info@contoso.com, you would enter the address https://outlook.office365.com/ecp/info@contoso.com
  • Click the “Organize email” section on the left menu

This method also works for editing user mailbox rules, provided you have access.

Note: As of this writing, Internet Explorer is the only browser I have tried that successfully adds a rule that involves a sent from or sent to criteria. Chrome and Firefox both give a CORS error because selecting people tries to open the contacts for the account, and that application is part of a different domain. The message I receive from Firefox is:

Load denied by 08:59:07.114 Load denied by X-Frame-Options: https://outlook.office.com/owa/#viewmodel=OwaOptionRichPeoplePickerViewModelFactory does not permit cross-origin framing.

Managing Global Rules

If you simply want to edit global rules that affects all mail flowing to/from your organization, you can follow the steps below:

Getting to the Exchange Admin Center

  • Login to Office 365 with an account that has administrative access to the Shared Mailbox
  • Open the Admin Center by clicking the “Admin” button
  • When the Admin Center opens, click the “Admin Centers” link on the left-side menu and choose “Exchange”

Managing Rules

  • From the Exchange Admin Center, there is a “Rules” link under the mail flow section. Remember, this area only allows you perform a limited set of actions, that does NOT include moving a message to a specific folder, as these rules are at a global level. The actions you can perform here are:
    • Forward the message for approval…
    • Redirect the message to…
    • Block the message…
    • Add recipients…
    • Apply a disclaimer to the message…
    • Modify the message properties…
    • Modify the message security…
    • Prepend the subject of the message with…
    • Generate incident report and send it to…
    • Notify the recipient with a message…

Windows Server Backup Trials and Tribulations

I know I’m not the first person to say this (2 people advised me of this today), but don’t use Windows Server Backup. Get a real backup solution and save yourself some serious headaches. I know I will be after my latest fun with WSB. *Caveat – my experience is with W2K8r2 servers, so perhaps it’s been improved in newer iterations.

I’ve been managing server hardware for a couple of small businesses for about 10 years now, and I’ve always just used Windows Server Backup (in conjunction with Azure backup, as of a couple of years ago) to backup said servers. Today was one of those moments where you have fleeting thoughts of “Am I going to get fired? Did I just lose all of the data for this organization?” I’d like to believe that I have better safeguards in place than to allow that to happen (and it turns out, I was able to get it working again), but man does it scare the crap out of you when you can’t get things working.

Considering my server is in a RAID 5 configuration, you might think “What is the problem? Swap out the failed drive and move on.” That’s what I normally do when I encounter a degraded virtual disk, but this time it didn’t work because a second disk failed during the rebuild process (different from the disk that caused the degradation in the first place). The system was fine until a several-hours-long power outage depleted my battery backup’s power, causing the system to go down in the dead of night. As a side effect, my virtual disk was degraded, and two of the physical disks comprising the virtual drive reported problems. The virtual disk kept failing to rebuild, and pulling out the drive that was keeping it from rebuilding would rendered the machine useless, since it wasn’t the hot spare.

I woke up extra early the next morning to come in and restore the backup image that was taken a few hours before that. The goal was to replace all of the failed physical disks, delete the virtual disk and create a new one with the same parameters as the old, and then restore the image – all before business started that day.

I guess I forgot how much of a PITA Windows Server Backup was to use. Here are the instructions I left myself from the last time I had a multi-disk failure:

  • A Windows System Image backup must be performed before attempting to restore.
    • Verify this before you nuke your virtual disk
  • Since the server has a Dell PERC s300 RAID controller, Windows needs the appropriate drivers to work with the RAID controller when restoring the image. Go to the Dell Support website and enter the Service Tag for the server.
    • Find the Windows Server 2008 R2 RAID / PERC s300 driver in the list of available drivers.
    • Download the “Hard Drive” format file (an EXE) and run it on the computer where you downloaded it (running the file will extract the drivers to a folder of your choosing)
      • Doing so will extract the drivers to a file on the computer where the EXE was run
    • Place the folder of drivers from the previous step onto an external drive (or burn to CD)
  • Check the settings of your RAID configuration – verify size and caching options so you can use them for your new image
  • Power off the server
  • Swap out any bad hardware for new drives
  • Restart the server and enter the RAID configuration menu (Ctrl + R).
    • Find the virtual disk and delete it (This can be also be done in OpenManage Server Administrator prior to rebooting).
    • Initialize any new physical disks for use in the virtual disk.
    • Create a new virtual disk, matching the configuration to what it was previously (usually all available space).
    • Make sure to swap the virtual disk into virtual disk slot 1 (the only bootable one), if you have multiple virtual disks in your array.
  • Install the Windows Server OS disc into the DVD tray
    • Continue to boot
  • When Windows Setup loads, choose the language and hit the next arrow.
  • Choose the “Repair an installation” link.
  • Click the “Load Drivers” button
  • Plug the external drive with the downloaded drivers into the server and choose the optical drive when Windows prompts you for the driver location.
  • Plug the external drive with the backup into the server
  • Choose the Restore from System image option and hit next
  • Windows Server Backup should now detect any backups from the external drive
    • Note, this can take a reallllly long time (> 30 minutes)
  • Choose next and Finish. The image should restore after a few hours.

One would think that with these detailed instructions, it should be fairly easy to restore the image. The answer is no. For one thing, Windows Recovery seemed to really struggle to recognize my external backup drive. This is especially disconcerting and leads you down some incorrect paths (starting to think the data is corrupted and trying to fix problems with the disk). Eventually, I got it to work through trial and error, but it took far too long and interrupted business operations far longer than it should have. Some problems I encountered:

  • The “Load Drivers” step is essential if you are restoring your image over an existing drive (and not starting from scratch). It may also be essential if you are starting from scratch – it never worked for me without loading those drivers, so I think it’s a good idea anyway (assuming you’re working with a RAID controller)
  • Feedback with the System Image Recovery is extremely poor. You have no idea what is going on most of the time. You are simply stuck with progress bars that never end. You really just have to wait and hope that it completes.
  • My first attempt at creating the virtual disk resulted in a just slightly smaller size than “all available space”, so during the image restore process, I received the helpful error “A data disk is currently set as active in BIOS. Set some other disk as active or use the DiskPart utility to clean the data disk, and then retry the restore operation. (0x80042406).” I opened up the diskpart utility and cleaned the data disk, but as I suspected, the problem was really something else. The disk size has to match (or exceed) the cloned drive
  • At one point, I loaded Windows as a fresh install and tried to restore from Windows Server Backup within the OS. The only problem with that is that you can’t do a bare metal restore – you can only restore files and drives. While this was better than nothing, I didn’t like the idea of trying to figure out how to reconfigure all the software on that machine. I’m glad I stuck with the image restore.
    • Windows Server Backup has always been incredibly slow, as well. For the most part, I use the command line when running backups with it, but the user interface is ungodly slow. Just opening it and getting to the initial view is really laggy.
  • It seemed like it mattered when I plugged my external drive in. If I had the drive plugged in when I started Windows Recovery, the image restore never found it. But when I plugged it in after loading the drivers, it seemed to be okay

In general, unresponsive software is a huge pet peeve of mine. Let me know that something is going on. I don’t know whether something is hanging or just takes forever. Also, no software should be this picky – especially something involved in a potentially mission critical application.

If you’ve encountered problems trying to restore your Win2K8r2 server and you’re using RAID-5, these steps might help. As I discovered today, the internet is full of reports of problems restoring from WSB (most posts are older because Server 2008 R2 is pretty old now, in OS years).

This episode did help me reflect on some problems though – namely, that I need to have more resilient processes in place for problems like this. For example, what if the motherboard failed on this server? What do I do then? Have Dell overnight a motherboard for this? Redundancy is non-existent here. Furthermore, this is an area where you need to practice – I’m sure if I had done restores more than 2-3 times in my entire life, it wouldn’t have been so bad. Developing some kind of a failure scenario like the infamous “Chaos Monkey” would help make things much more resilient. At the very least, better documentation on the services and applications installed on the server now will help in the event of a catastrophic loss or even just a migration.

As a developer, though, the correct answer is really to just virtualize all the things so I don’t have to worry about disk failures anymore 🙂