Too Many Layers of Abstraction: January 2009

Thursday, January 29, 2009

LINQ to SQL vs. LINQ to Entity Framework

No, this isn't a cage match. Sorry to those who expected otherwise.

What I do want to point out is that the programming models are different, and differ in ways that are going to make some people think that certain features are missing from one tool or the other. If you're one of those people who thinks that LINQ to SQL has a performance problem because of lazy loading, or that LINQ to Entity Framework doesn't give you a way to turn off change tracking for query-only usage, then I'm talking to you, friend!

For instance, let's look at that change tracking issue. Both LINQ to SQL and LINQ to EF will track changes to entities so that you can change data in an object and save the data simply by submitting changes through the DataContext or ObjectContext. But for query-only operations, this change tracking is unnecessary, and you can turn it off. Here's how you do it in LINQ to SQL:

public static List<Customer> GetCustomers()
{
    using (NorthwindData dc = new NorthwindData())
    {
        dc.ObjectTrackingEnabled = false;
        return (from c in dc.Customers select c).ToList();
    }
}

And here's how you do it in LINQ to Entity Framework:

public static List<Customers> GetCustomers()
{
    using (NorthwindEntities ne = new NorthwindEntities())
    {
        ne.Customers.MergeOption = MergeOption.NoTracking;
        return (from c in ne.Customers select c).ToList();
    }
}

Note the big difference here. In LINQ to SQL, you turn off tracking in the DataContext. In LINQ to EF, you turn off tracking in the entity collection. If you're looking in the wrong place, you'll miss it.

The same pattern holds true with eager loading and lazy loading. In LINQ to SQL, it's controlled with a DataLoadOptions object attached to the DataContext. In LINQ to EF, it's set at the entity level, either in the query (eager loading) or in processing the results (lazy loading).

I'll cover eager loading and lazy loading in a follow-up post, because there's a nasty surprise waiting in LINQ to SQL eager loading.

Sunday, January 25, 2009

Entity Framework and LINQ to SQL Performance

Updated: Modified embedded SQL queries to use parameters where appropriate. Results updated.

I've been playing with Entity Framework recently, and noticed that it seemed to be much slower than LINQ to SQL. I ran some tests, and sure enough, I was right. The numbers are interesting:

The code is available here if you want to run these tests yourself.

Methodology

The structure of the test was to set up a static method to return data from the Customers table of Northwind, suitable for binding to an ObjectDataSource in ASP.NET. I ran two sets of tests, one to return six columns from all rows, and one to return the same six columns from a single row. Each set contained the following variations:

DataReader, to provide baseline performance to compare against other technologies.
DataTable, using classic ADO.NET tools (DataAdapter running a command to fill a table).
LINQ to SQL, using a compiled query, and with object tracking turned off, to maximize performance. The results list was projected directly from the query.
LINQ to Entity Framework, using a compiled query to maximize performance. As with LINQ to SQL, the results list was projected directly from the query.
Entity SQL, as an alternative to LINQ, querying the Entity Framework. The code structure for Entity SQL uses a reader, similar to using a DataReader with T-SQL.

For both LINQ to SQL and Entity Framework, I used the visual designer tools to include only the Customers table in the model.

The test measured elapsed time and total processor time. The difference could be assumed to include time used by SQL Server, as well as any other out-of-process time. I ran the tests on a Dell Latitude E6500 with Vista Ultimate, SQL Server 2008, an Intel Core 2 Duo P9500 (2.5 GHz), 4GB RAM, and 7200 RPM disk. The system was idle except for tests; test runs were fairly consistent in timings, as measured by standard deviations over a set of 10 test runs.

The test program ran each query once to ensure that all code was loaded and JITed, and all access plans and data were cached, so that startup time was excluded for each scenario. The program then ran 10,000 queries and collected aggregate time and working set information. For each scenario, the test program was run once, then run 10 times to record timing data.

Results

Keep in mind that the test was designed to measure only the code execution for queries. There is no business logic, and the test design ensured that start-up costs were excluded from test results.

As expected, using a DataReader with raw T-SQL is the best performer, and the technology of choice for extremely large data volumes and for applications where performance is the only thing that matters. The DataReader used .40 milliseconds (elapsed) to retrieve 92 rows and store the data in a list, and only .15 milliseconds for a single row.

The DataTable with classic ADO.NET performed almost as well, using .58 milliseconds (elapsed) for 92 rows and .18 milliseconds for a single row. In the chart above, the DataReader is used as a baseline for comparison, so the relative cost of using a DataTable and DataAdapter was 1.4 for 92 rows, and 1.2 for a single row. That's not a lot of overhead in exchange for using a standardized structure that includes metadata on names and data types. Memory usage was virtually identical to memory usage for the DataReader.

LINQ to SQL also performed very well, using .63 milliseconds (elapsed) for 92 rows and .36 milliseconds for a single row. The performance ratio compared to the DataReader is 1.6 for 92 rows and 2.3 for a single row. Compared to the DataTable, the performance ratio (not charted) was 1.2 for 92 rows and 1.9 for a single row. LINQ to SQL used 40 MB additional memory, based on the final working set size at the end of each run.

That's very decent performance, considering the additional overhead, although Rico Mariani of Microsoft got even better numbers (and I'd love to know how to get closer to those results). In my tests, all queries established new connection objects (or data contexts) for each query, but I can't tell if Rico did the same in his performance tests. This may account for the difference in performance.

With Entity Framework, I found significant additional performance costs. LINQ to EF used 2.73 milliseconds (elapsed) to retrieve 92 rows, and 2.43 milliseconds for a single row. For 92 rows, that's a performance ratio of 6.8 compared to the DataReader, 4.7 compared to the DataTable, and 4.4 compared to LINQ to SQL (the latter two are not charted above). For a single row, LINQ to EF used 2.43 millisecond (elapsed), with performance ratios of 16.0 compared to the DataReader, 13.2 compared to the DataTable, and 6.8 compared to LINQ to SQL. Memory usage for LINQ to EF was about 130 MB more than for the DataReader.

Entity SQL queries to EF performed about the same as LINQ to EF, with 2.78 milliseconds (elapsed) for 92 rows and 2.32 milliseconds for a single row. Memory usage was similar to LINQ to EF.

Conclusions

Some of the conclusions are obvious. If performance is paramount, go with a DataReader! Entity Framework uses two layers of object mapping (compared to a single layer in LINQ to SQL), and the additional mapping has performance costs. At least in EF version 1, application designers should choose Entity Framework only if the modeling and ORM mapping capabilities can justify that cost.

In between those extremes, the real surprise is that LINQ to SQL can perform so well. (The caveat is that tuning LINQ to SQL is not always straight-forward.) The advantage that LINQ (including LINQ to EF) offers is in code quality, resulting from two key improvements over classic ADO.NET:

Names and data types are strongly enforced from top to bottom of your application. Very specifically, that means all the way down to the tables in the database. LINQ uses .NET types, further simplifying the developer's life.
DataTables and DataSets bring the relational data model rather intrusively into the code. To process data in a DataTable, you must adapt your code to the DataTable structure, including DataRows, DataColumns, and (with DataSets and multiple tables) DataRelationships. By contrast, LINQ is a first-class language component in .NET, with object-relational mapping inherent in the LINQ data providers and models. Processing data with LINQ feels like processing object collections, because that's exactly what you're doing.

So for now, LINQ to SQL is a winner! As Entity Framework version 2 takes shape, it will be time to re-evaluate.

Edited to fix math errors (blush). These affect timings only, but since the chart is based on ratios, those numbers are still correct.

Sunday, January 18, 2009

Triad SQL Server Group

I'll be speaking at the Triad SQL Server Group on Tuesday, February 17, at 6:00 pm. The topic is "The LINQ Revolution":

Microsoft's inclusion of query capabilities as a first-class language component in .NET, along with two object-relational mapping (ORM) solutions in LINQ to SQL and the Entity Framework, will change the ways that you develop database applications. This will be an open discussion, where you choose the topics of greatest interest, including anything from LINQ syntax and code generation to ORM, domain-driven design, n-tiered design issues as they related to LINQ and ORM, the changing role of stored procedures with ORM, and entity-relationship modeling now that E-R models can be represented directly in code.

Presentation materials and sample code are here.

If you're in the area, I hope to see you there!

ETA: Date changed again. This time we won't let it snow that day!

Wednesday, January 7, 2009

LINQ of the Day

One of the delights of LINQ to SQL is that it's an extension of LINQ to objects, but it doesn't implement everything that LINQ to objects implements. The constraint on LINQ to SQL is that the expression must be translatable into T-SQL. If you write LINQ expressions that don't translate to T-SQL, your code will compile cleanly, but will fail when LINQ is ready to do the translation. (The exception and call stack can best be described as "educational.") Let's look at an example that fixes the problem.

public static List<MultiplierDefinition> GetAllMultipliers()
{
    using (FSP dc = FSP.DC())
    {
        return GetMultipliers(dc)
            .AsEnumerable()
            .OrderBy(m => m, new MultiplierDescComparer<MultiplierDefinition>())
            .ToList();
    }
}

Before tearing into this, let's take a quick look at GetMultpliers, which is a compiled LINQ-to-SQL query. There's nothing strange about it; I write stuff like this routinely. The thing to note is that it returns an IQueryable<MultiplierDefinition>.

private static Func<FSP, IQueryable<MultiplierDefinition>> GetMultipliers = 
    CompiledQuery.Compile(
    (FSP dc) => from m in dc.Pm_Multipliers
                where m.RecordDeleted != true
                select new MultiplierDefinition
                {
                    MultiplierID = m.MultiplierID,
                    Description = m.Description,
                    DiscountPercent = m.DiscountPercent,
                    CreatedByUser = m.CreatedByUser,
                    DateCreated = m.DateCreated,
                    ModifiedByUser = m.ModifiedByUser,
                    DateModified = m.DateModified,
                    RecordDeleted = m.RecordDeleted
                });

So my method GetAllMultipliers() starts by getting a DataContext, and calling a method to get a LINQ-to-SQL expression tree. But the next thing I need to do is sort the data, using a custom sorting algorithm contained in an IComparer<T> object called MultiplierDescComparer<MultiplierDefinition>. There's no way that T-SQL can sort using a C# object! Once the sorting is done, I simply return a generic List<MultiplierDefinition> to my client code.

There are two secrets to making this work. One is that you can add LINQ expressions to a compiled query; they'll be added to the expression tree before the query runs. The second secret, and the important one, is that innocuous little method, .AsEnumerable(). Hey! IQueryable<T> implements IEnumerable<T>, so we're already enumerable. Why do we have to this explicit... Conversion? Reminder? Expositional countersinking? No-op?

It's a conversion, and here's what it means. Everything before .AsEnumerable() will translate to T-SQL and run in the database. Everything after .AsEnumerable() is a LINQ to objects expression, and will run on the data returned from the database. (That's always a core difference between IQueryable<T> and IEnumerable<T>, and that's why you don't want to store a LINQ-to-SQL query as an IEnumerable<T>, unless you want to force any subsequent LINQ expressions to be run in memory.)

None of this forces immediate execution; it's all still deferred execution, right up to the .ToList() method. Evaluating and understanding the entire code and execution path is a bit convoluted, but definitely enlightening. .AsEnumerable() is always useful when you want to use LINQ that doesn't have a counterpart in T-SQL (for example, .SkipWhile() and .TakeWhile()).

Thursday, January 1, 2009

Happy New Year

A small reminder that even in a rough year, some things remain beautiful.

Also noting that my job has been extended through almost the end of January. So that's a good start.

Too Many Layers of Abstraction