Sunday, January 25, 2009

Entity Framework and LINQ to SQL Performance

Updated: Modified embedded SQL queries to use parameters where appropriate. Results updated.

I've been playing with Entity Framework recently, and noticed that it seemed to be much slower than LINQ to SQL. I ran some tests, and sure enough, I was right. The numbers are interesting:

The code is available here if you want to run these tests yourself.

Methodology

The structure of the test was to set up a static method to return data from the Customers table of Northwind, suitable for binding to an ObjectDataSource in ASP.NET. I ran two sets of tests, one to return six columns from all rows, and one to return the same six columns from a single row. Each set contained the following variations:

  1. DataReader, to provide baseline performance to compare against other technologies.
  2. DataTable, using classic ADO.NET tools (DataAdapter running a command to fill a table).
  3. LINQ to SQL, using a compiled query, and with object tracking turned off, to maximize performance. The results list was projected directly from the query.
  4. LINQ to Entity Framework, using a compiled query to maximize performance. As with LINQ to SQL, the results list was projected directly from the query.
  5. Entity SQL, as an alternative to LINQ, querying the Entity Framework. The code structure for Entity SQL uses a reader, similar to using a DataReader with T-SQL.

For both LINQ to SQL and Entity Framework, I used the visual designer tools to include only the Customers table in the model.

The test measured elapsed time and total processor time. The difference could be assumed to include time used by SQL Server, as well as any other out-of-process time. I ran the tests on a Dell Latitude E6500 with Vista Ultimate, SQL Server 2008, an Intel Core 2 Duo P9500 (2.5 GHz), 4GB RAM, and 7200 RPM disk. The system was idle except for tests; test runs were fairly consistent in timings, as measured by standard deviations over a set of 10 test runs.

The test program ran each query once to ensure that all code was loaded and JITed, and all access plans and data were cached, so that startup time was excluded for each scenario. The program then ran 10,000 queries and collected aggregate time and working set information. For each scenario, the test program was run once, then run 10 times to record timing data.

Results

Keep in mind that the test was designed to measure only the code execution for queries. There is no business logic, and the test design ensured that start-up costs were excluded from test results.

As expected, using a DataReader with raw T-SQL is the best performer, and the technology of choice for extremely large data volumes and for applications where performance is the only thing that matters. The DataReader used .40 milliseconds (elapsed) to retrieve 92 rows and store the data in a list, and only .15 milliseconds for a single row.

The DataTable with classic ADO.NET performed almost as well, using .58 milliseconds (elapsed) for 92 rows and .18 milliseconds for a single row. In the chart above, the DataReader is used as a baseline for comparison, so the relative cost of using a DataTable and DataAdapter was 1.4 for 92 rows, and 1.2 for a single row. That's not a lot of overhead in exchange for using a standardized structure that includes metadata on names and data types. Memory usage was virtually identical to memory usage for the DataReader.

LINQ to SQL also performed very well, using .63 milliseconds (elapsed) for 92 rows and .36 milliseconds for a single row. The performance ratio compared to the DataReader is 1.6 for 92 rows and 2.3 for a single row. Compared to the DataTable, the performance ratio (not charted) was 1.2 for 92 rows and 1.9 for a single row. LINQ to SQL used 40 MB additional memory, based on the final working set size at the end of each run.

That's very decent performance, considering the  additional overhead, although Rico Mariani of Microsoft got even better numbers (and I'd love to know how to get closer to those results). In my tests, all queries established new connection objects (or data contexts) for each query, but I can't tell if Rico did the same in his performance tests. This may account for the difference in performance.

With Entity Framework, I found significant additional performance costs. LINQ to EF used 2.73 milliseconds (elapsed) to retrieve 92 rows, and 2.43 milliseconds for a single row. For 92 rows, that's a performance ratio of 6.8 compared to the DataReader, 4.7 compared to the DataTable, and 4.4 compared to LINQ to SQL (the latter two are not charted above). For a single row, LINQ to EF used 2.43 millisecond (elapsed), with performance ratios of 16.0 compared to the DataReader, 13.2 compared to the DataTable, and 6.8 compared to LINQ to SQL. Memory usage for LINQ to EF was about 130 MB more than for the DataReader.

Entity SQL queries to EF performed about the same as LINQ to EF, with 2.78 milliseconds (elapsed) for 92 rows and 2.32 milliseconds for a single row. Memory usage was similar to LINQ to EF.

Conclusions

Some of the conclusions are obvious. If performance is paramount, go with a DataReader! Entity Framework uses two layers of object mapping (compared to a single layer in LINQ to SQL), and the additional mapping has performance costs. At least in EF version 1, application designers should choose Entity Framework only if the modeling and ORM mapping capabilities can justify that cost.

In between those extremes, the real surprise is that LINQ to SQL can perform so well. (The caveat is that tuning LINQ to SQL is not always straight-forward.) The advantage that LINQ (including LINQ to EF) offers is in code quality, resulting from two key improvements over classic ADO.NET:

  1. Names and data types are strongly enforced from top to bottom of your application. Very specifically, that means all the way down to the tables in the database. LINQ uses .NET types, further simplifying the developer's life.
  2. DataTables and DataSets bring the relational data model rather intrusively into the code. To process data in a DataTable, you must adapt your code to the DataTable structure, including DataRows, DataColumns, and (with DataSets and multiple tables) DataRelationships. By contrast, LINQ is a first-class language component in .NET, with object-relational mapping inherent in the LINQ data providers and models. Processing data with LINQ feels like processing object collections, because that's exactly what you're doing.

So for now, LINQ to SQL is a winner! As Entity Framework version 2 takes shape, it will be time to re-evaluate.

Edited to fix math errors (blush). These affect timings only, but since the chart is based on ratios, those numbers are still correct.

13 comments:

--rj said...

I was never able to duplicate Rico Mariani's results, although I came close.

See http://oakleafblog.blogspot.com/search?q=rico+mariani

Mark said...

Diane,

Your post comparing performance is very interesting.

We are looking at replacing a lot of existing software with new code that could use EF. But the old code delivers an exceptional user experience by heavily relying on API server cursors.

http://technet.microsoft.com/en-us/library/ms187088.aspx API server cursors

If I am in plain old .NET 2.0 land I believe I can craft a data access layer that will leverage the ExecutePageReader method to give me the benefits of an API server cursor.

But what or how will we be able to implement and use API server cursors if all we write is EF code? And where should I look to find answers and examples about using API server cursors with EF? Can you point me to a good reference?

If EF does not expose API server cursors, is there another way to accomplish the same thing that ExecutePageReader does? (fast, forward-only cursors that can return chunks of data before the whole set is done)

Thank you for your help!

Diane Wilson said...

Mark,

I'm certainly not an expert on EF, but I've seen nothing in EF (or LINQ to SQL) to suggest support for cursors. Cursors appear to be deprecated in the .NET world.

Even before LINQ, the pattern for ADO.NET queries has been short-lived connections - open a connection, grab a chunk of data, and close the connection immediately. That was true of DataAdapters all the way back to 1.1, at least.

The alternative with LINQ is to use the .Take() and .Skip() methods to implement your own paging:

var page = dc.Customers.Skip(20).Take(10);

prodigal747 said...

Diane,

I downloaded your performance tests today and retargetted them toward .NET 4.0 using Visual Studio 2010 beta 1.

The relative performance of the entity framework showed much better.

Sorry for the poor formatting, but here are my ratio values for the new entity framework copied from the spreadsheet you provided:

LINQ to Entities 1.6 2.0 1.8 4.5
ESQL Reader 1.4 1.7 1.5 2.8

Leonid Ganeline said...

SQl for 1 raw??? Are you kiding me?
It is like test tracks by towing bikes or cats.

Matias said...

Interesting, one point here... the test NEEDS to put a network between the db and your tests (and document the bandwidth/lattency), otherwise is quite difficult to translate to real-life expectations, network speed and latency (I expect) will further emphasize the performance differences between EF - LINQ.

-Mat

Shankar said...

Hi Diane,

I would be very interested to know your opinion on the current version of EF as compared to the other approaches. I am in the verge of choosing between EF and L2S for one of my projects handling considerable transactional data. Given MSFT's 'official' recommendation that EF is their preferred way to ORM, I am a bit skeptical about picking it up looking at performance tests like yours.

TIA,

Shankar

Zhou Feng said...

Entity framework spends huge amount of time to build itself up, like retrieve meta data work space.

The ideal pattern is build entity and use it for longer time. Any construction of entities is a big deal.

Another word, Entity framework 4.0 is much better than 3.5. If you start, start with 4.0 (VS2010, .Net 4.0).

62316e said...

Bullshit!

All my EF test gives me 10x perfomens over LINQ2SQL with poco classes.

batistis said...

you forgot typed dataset!

danielkzu said...

Amazing how the sole passing of time makes things SO much better.

I just upgraded the projects to VS2010, made the projects .NET 4.0 (so they picked automatically EF 4.0 instead), and now the table (single row is still expensive in EF) performance is roughly equal to LINQ to SQL and very very competitive against DataTable and DataReader.

It's no longer a concern of mine.

Would be great to update this blog post with that information.

Dave said...

Please consider updating this post with new results... Three years can change a lot.

NWest said...

I'd like to see results for different orders of magnitude:

100 rows, 10000 rows, 1m rows.

Would this negatively affect EF? I would expect so.