Extras
Ten LINQ Myths
Here are ten root causes of the most common
misunderstandings—distilled from many hundreds of
questions on the
LINQ forums.
Myth #1
All LINQ queries must start with the ‘var’ keyword. In fact,
the very purpose of the ‘var’ keyword is to start a LINQ query!
The var keyword and LINQ queries are separate concepts. The purpose of
var is to let the compiler guess what type you want for a local variable
declaration (implicit typing). For example, the following:
var s = "Hello";
is precisely equivalent to:
string s = "Hello";
because the compiler infers that s is a string.
Similarly, the following query:
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Where (p => p.Length > 3);
is precisely equivalent to:
string[] people = new [] { "Tom", "Dick", "Harry" };
IEnumerable<string> filteredPeople = people.Where (p => p.Length > 3);
You can see here that all that we're achieving with var is to
abbreviate IEnumerable<string>. Some people like this
because it cuts clutter; others argue that implicit typing can make it less clear
what's going on.
Now, there are times when a LINQ query necessitates the use of var. This is when projecting an anonymous type:
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Select (p => new { Name = p, p.Length });
Here is an example of using an anonymous type outside the context of LINQ
query:
var person = new { Name="Foo", Length=3 };
Myth #2
All LINQ queries must use query syntax.
There are two kinds of syntax for queries: lambda syntax
and query syntax (or query comprehension syntax). Here's an
example of lambda syntax:
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Where (p => p.Length > 3);
Here's the same thing expressed in query syntax:
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = from p in people where p.Length > 3 select p;
Logically, the compiler translates query syntax into lambda syntax. This
means that everything that can be expressed in query syntax can also be
expressed in lambda syntax. Query syntax can be a lot simpler, though, with
queries that involve more than one range variable. (In this example,
we used just a single range variable, p, so the two syntaxes were similarly simple).
Not all operators are supported in query syntax, so the two syntax styles
are complementary. For the best of both worlds, you can mix query
styles in a single statement (see Myth #5 for an example).
Myth #3
To retrieve all customers from the customer table, you must perform a query
similar to the following:
var query = from c in db.Customers select c;
The expression:
from c in db.Customers select c
is a frivolous query! You can simply go:
db.Customers
Similarly, the following LINQ to XML query:
var xe = from e in myXDocument.Descendants ("phone") select e;
can be simplified to:
var xe = myXDocument.Descendants ("phone");
And this:
Customer customer = (from c in db.Customers where c.ID == 123 select c)
.Single();
can be simplified to:
Customer customer = db.Customers.Single (c => c.ID == 123);
Myth #4
To reproduce a SQL query in LINQ, you must make the LINQ query look as
similar as possible to the SQL query.
LINQ and SQL are different languages that employ very
different concepts.
Possibly the biggest barrier in becoming productive with LINQ is the
"thinking in SQL" syndrome: mentally formulating your queries in SQL and
then transliterating them into LINQ. The result is that you're constantly
fighting the API!
Once you start thinking
directly in LINQ, your queries will often bear little resemblance to
their SQL counterparts. In many cases, they'll be radically simpler, too.
Myth #5
To do joins efficiently in LINQ, you must use the join keyword.
This is true, but only when querying local
collections. When querying a database, the join keyword is completely
unnecessary: all ad-hoc joins can be accomplished using multiple from
clauses and subqueries. Multiple from clauses and subqueries are
more versatile too: you can also perform
non-equi-joins.
Better still, in LINQ to SQL and Entity Framework, you can query association
properties, alleviating the need to join altogether! For instance, here's
how to retrieve the names and IDs of all customers who have made no
purchases:
from c in db.Customers
where !c.Purchases.Any()
select new { c.ID, c.Name }
Or, to retrieve customers who have made no purchases over $1000:
from c in db.Customers
where !c.Purchases.Any (p => p.Price > 1000)
select new { c.ID, c.Name }
Notice that we're mixing fluent and query syntax. See LINQPad for more
examples on association properties, manual joins, and mixed-syntax queries.
Myth #6
Because SQL emits flat result sets, LINQ queries
must be structured to emit flat result sets, too.
This is a consequence of Myth #4. One of LINQ's big
benefits is that you can:
- Query a structured object graph through association
properties (rather than having to manually join)
- Project directly into object hierarchies
The two are independent, although 1 helps 2. For example,
if you want to retrieve the names of customers in the state of WA along with
all their purchases, you can simply do the following:
from c in db.Customers
where c.State == "WA"
select new
{
c.Name,
c.Purchases // An EntitySet (collection)
}
The hierarchical result from this query is much easier to work with than
a flat result set!
We can achieve the same result without association properties as
follows:
from c in db.Customers
where c.State == "WA"
select new
{
c.Name,
Purchases = db.Purchases.Where (p => p.CustomerID == c.ID)
}
Myth #7
To do outer joins in LINQ to SQL, you must always use DefaultIfEmpty().
This is true only if you want a flat result set.
The examples in the preceding myth, for instance, translate to a left outer join in
SQL, and require no DefaultIfEmpty operator.
Myth #8
A LINQ to SQL or EF query will be executed in one round-trip only if the query
was built in a single step.
LINQ follows a lazy evaluation model, which means queries
execute not when constructed, but when enumerated. This means
you can build up a query in as many steps as you like, and it won't actually
hit the server until you eventually start consuming the results.
For instance, the following query retrieves the names of all customers
whose name starts with the letter 'A', and who have made at least two
purchases. We build this query in three steps:
var query = db.Customers.Where (c => c.Name.StartsWith ("A"));
query = query.Where (c => c.Purchases.Count() >= 2);
var result = query.Select (c => c.Name);
foreach (string name in result) // Only now is the query executed!
Console.WriteLine (name);
Myth #9
A method cannot return a query, if the query ends in the 'new'
operator
The trick is to project into an ordinary named type
with
an object initializer:
public IQueryable<NameDetails> GetCustomerNamesInState (string state)
{
return
from c in Customer
where c.State == state
select new NameDetails
{
FirstName = c.FirstName,
LastName = c.LastName
};
}
NameDetails is a class that you'd define as follows:
public class NameDetails
{
public string FirstName, LastName;
}
Myth #10
The best way to use LINQ to SQL is to instantiate a single DataContext to a static property, and use that shared instance for the life of the application.
This strategy will result in stale data, because objects
tracked by a DataContext instance are not refreshed simply by requerying.
Using a single static DataContext instance in the middle tier of a
distributed application will cause further trouble, because DataContext
instances are not thread-safe.
The correct approach is to instantiate fresh DataContext objects as
required, keeping DataContext instances fairly short-lived. The same applies with
Entity Framework.