Updated: Modified embedded SQL queries to use parameters where appropriate. Results updated.
I've been playing with Entity Framework recently, and noticed that it seemed to be much slower than LINQ to SQL. I ran some tests, and sure enough, I was right. The numbers are interesting:
The code is available here if you want to run these tests yourself.
The structure of the test was to set up a static method to return data from the Customers table of Northwind, suitable for binding to an ObjectDataSource in ASP.NET. I ran two sets of tests, one to return six columns from all rows, and one to return the same six columns from a single row. Each set contained the following variations:
- DataReader, to provide baseline performance to compare against other technologies.
- DataTable, using classic ADO.NET tools (DataAdapter running a command to fill a table).
- LINQ to SQL, using a compiled query, and with object tracking turned off, to maximize performance. The results list was projected directly from the query.
- LINQ to Entity Framework, using a compiled query to maximize performance. As with LINQ to SQL, the results list was projected directly from the query.
- Entity SQL, as an alternative to LINQ, querying the Entity Framework. The code structure for Entity SQL uses a reader, similar to using a DataReader with T-SQL.
For both LINQ to SQL and Entity Framework, I used the visual designer tools to include only the Customers table in the model.
The test measured elapsed time and total processor time. The difference could be assumed to include time used by SQL Server, as well as any other out-of-process time. I ran the tests on a Dell Latitude E6500 with Vista Ultimate, SQL Server 2008, an Intel Core 2 Duo P9500 (2.5 GHz), 4GB RAM, and 7200 RPM disk. The system was idle except for tests; test runs were fairly consistent in timings, as measured by standard deviations over a set of 10 test runs.
The test program ran each query once to ensure that all code was loaded and JITed, and all access plans and data were cached, so that startup time was excluded for each scenario. The program then ran 10,000 queries and collected aggregate time and working set information. For each scenario, the test program was run once, then run 10 times to record timing data.
Keep in mind that the test was designed to measure only the code execution for queries. There is no business logic, and the test design ensured that start-up costs were excluded from test results.
As expected, using a DataReader with raw T-SQL is the best performer, and the technology of choice for extremely large data volumes and for applications where performance is the only thing that matters. The DataReader used .40 milliseconds (elapsed) to retrieve 92 rows and store the data in a list, and only .15 milliseconds for a single row.
The DataTable with classic ADO.NET performed almost as well, using .58 milliseconds (elapsed) for 92 rows and .18 milliseconds for a single row. In the chart above, the DataReader is used as a baseline for comparison, so the relative cost of using a DataTable and DataAdapter was 1.4 for 92 rows, and 1.2 for a single row. That's not a lot of overhead in exchange for using a standardized structure that includes metadata on names and data types. Memory usage was virtually identical to memory usage for the DataReader.
LINQ to SQL also performed very well, using .63 milliseconds (elapsed) for 92 rows and .36 milliseconds for a single row. The performance ratio compared to the DataReader is 1.6 for 92 rows and 2.3 for a single row. Compared to the DataTable, the performance ratio (not charted) was 1.2 for 92 rows and 1.9 for a single row. LINQ to SQL used 40 MB additional memory, based on the final working set size at the end of each run.
That's very decent performance, considering the additional overhead, although Rico Mariani of Microsoft got even better numbers (and I'd love to know how to get closer to those results). In my tests, all queries established new connection objects (or data contexts) for each query, but I can't tell if Rico did the same in his performance tests. This may account for the difference in performance.
With Entity Framework, I found significant additional performance costs. LINQ to EF used 2.73 milliseconds (elapsed) to retrieve 92 rows, and 2.43 milliseconds for a single row. For 92 rows, that's a performance ratio of 6.8 compared to the DataReader, 4.7 compared to the DataTable, and 4.4 compared to LINQ to SQL (the latter two are not charted above). For a single row, LINQ to EF used 2.43 millisecond (elapsed), with performance ratios of 16.0 compared to the DataReader, 13.2 compared to the DataTable, and 6.8 compared to LINQ to SQL. Memory usage for LINQ to EF was about 130 MB more than for the DataReader.
Entity SQL queries to EF performed about the same as LINQ to EF, with 2.78 milliseconds (elapsed) for 92 rows and 2.32 milliseconds for a single row. Memory usage was similar to LINQ to EF.
Some of the conclusions are obvious. If performance is paramount, go with a DataReader! Entity Framework uses two layers of object mapping (compared to a single layer in LINQ to SQL), and the additional mapping has performance costs. At least in EF version 1, application designers should choose Entity Framework only if the modeling and ORM mapping capabilities can justify that cost.
In between those extremes, the real surprise is that LINQ to SQL can perform so well. (The caveat is that tuning LINQ to SQL is not always straight-forward.) The advantage that LINQ (including LINQ to EF) offers is in code quality, resulting from two key improvements over classic ADO.NET:
- Names and data types are strongly enforced from top to bottom of your application. Very specifically, that means all the way down to the tables in the database. LINQ uses .NET types, further simplifying the developer's life.
- DataTables and DataSets bring the relational data model rather intrusively into the code. To process data in a DataTable, you must adapt your code to the DataTable structure, including DataRows, DataColumns, and (with DataSets and multiple tables) DataRelationships. By contrast, LINQ is a first-class language component in .NET, with object-relational mapping inherent in the LINQ data providers and models. Processing data with LINQ feels like processing object collections, because that's exactly what you're doing.
So for now, LINQ to SQL is a winner! As Entity Framework version 2 takes shape, it will be time to re-evaluate.
Edited to fix math errors (blush). These affect timings only, but since the chart is based on ratios, those numbers are still correct.