Given a partitioned table and a simple SELECT query that compares the partitioning column to a single literal value, why does SQL Server read all the partitions when it seems obvious that only one partition needs to be examined?
Compute Scalars, Expressions and Execution Plan Performance
The humble Compute Scalar is one of the least well-understood of the execution plan operators, and usually the last place people look for query performance problems. It often appears in execution plans with a very low (or even zero) cost, which goes some way to explaining why people ignore it.
Some readers will already know that a Compute Scalar can contain a call to a user-defined function, and that any T-SQL function with a BEGIN…END block in its definition can have truly disastrous consequences for performance (see When is a SQL function not a function? by Rob Farley for details).
Yes. It sounds counter-intuitive on the face of it. Deleting rows frees up space on a page, and page splitting occurs when a page needs additional space. Nevertheless, there are circumstances when deleting rows causes them to expand before they can be deleted.
SQL Server (since 2005) caches temporary tables and table variables referenced in stored procedures for reuse, reducing contention on tempdb allocation structures and catalogue tables.
A number of things can prevent this caching (none of which are allowed when working with table variables):
Named constraints (bad idea anyway, since concurrent executions can cause a name collision)
DDL after creation (though what is considered DDL is interesting)
Creation using dynamic SQL
Table created in a different scope
Procedure executed using WITH RECOMPILE
Temporary objects are often created and destroyed at a high rate in production systems, so caching can be an important optimization.
Ask anyone what the primary advantage of temporary tables over table variables is, and the chances are they will say that temporary tables support statistics and table variables do not.
This is true, of course. The indexes that enforce PRIMARY KEY and UNIQUE constraints on table variables do not have populated statistics associated with them. Neither do any non-constraint table variable indexes (using inline index definitions, available starting with SQL Server 2014). Finally, it is not possible to manually create statistics on table variables.
Intuitively, then, any query that has alternative execution plans to choose from ought to benefit from using a temporary table rather than a table variable. This is also true, up to a point.
This is the final part in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Our AdventureWorks test query produces an optimized physical execution plan that is quite different from the logical form of the query.
The estimated cost of the execution plan shown below is 0.0295 units.
Since we know the database schema very well, we might wonder why the optimizer did not choose to use the unique nonclustered index on Name in the Product table to filter rows based on the LIKE predicate.
This is the third in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
We saw in part 2 how optimizer rules are used to explore logical alternatives for parts of the query tree, and how implementation rules are used to find physical operations to perform each logical steps.
To keep track of all these options, the cost-based part of the SQL Server query optimizer uses a structure called the Memo. This structure is part of the Cascades general optimization framework developed by Goetz Graefe.
This is the second in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
The input to cost-based optimization is a tree of logical operations produced by the previous optimization stages discussed in part one.
Cost-based optimization takes this logical tree, explores logical alternatives (different logical tree shapes that will always produce the same results), generates physical implementations, assigns an estimated cost to each, and finally chooses the cheapest physical option overall.
The goal of cost-based optimization is not to find the best possible physical execution plan by exploring every possible alternative. Rather, the goal is to find a good plan quickly.
This is the first in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
The motivation behind writing these sessions is finding that relatively few people have a good intuition for the way the optimizer works. This is partly because the official documentation is rather sparse, and partly because what information is available is dispersed across many books and blog posts.
The content presented here is very much geared to my preferred way of learning. It shows the concepts in what seems to me to be a reasonably logical sequence, and then provides tools to enable the interested reader to explore further, if desired.
There are interesting things to be learned from even the simplest queries.
For example, imagine you are asked to write a query that lists AdventureWorks product names, where the product has at least one entry in the transaction history table, but fewer than ten.