Given a partitioned table and a simple SELECT
query that compares the partitioning column to a single literal value, why does SQL Server read all the partitions when it seems obvious that only one partition needs to be examined?
About This Blog
Wednesday 12 September 2012
Why Doesn’t Partition Elimination Work?
Wednesday 5 September 2012
Compute Scalars, Expressions and Execution Plan Performance
The humble Compute Scalar is one of the least well-understood of the execution plan operators, and usually the last place people look for query performance problems. It often appears in execution plans with a very low (or even zero) cost, which goes some way to explaining why people ignore it.
Some readers will already know that a Compute Scalar can contain a call to a user-defined function, and that any T-SQL function with a BEGIN…END
block in its definition can have truly disastrous consequences for performance (see When is a SQL function not a function? by Rob Farley for details).
This post is not about those sorts of concerns.
Friday 31 August 2012
Deletes that Split Pages and Forwarded Ghosts
Can DELETE
operations cause pages to split?
Yes. It sounds counter-intuitive on the face of it. Deleting rows frees up space on a page, and page splitting occurs when a page needs additional space. Nevertheless, there are circumstances when deleting rows causes them to expand before they can be deleted.
Friday 17 August 2012
Temporary Table Caching Explained
SQL Server (since 2005) caches temporary tables and table variables referenced in stored procedures for reuse, reducing contention on tempdb allocation structures and catalogue tables.
A number of things can prevent this caching (none of which are allowed when working with table variables):
- Named constraints (bad idea anyway, since concurrent executions can cause a name collision)
- DDL after creation (though what is considered DDL is interesting)
- Creation using dynamic SQL
- Table created in a different scope
- Procedure executed using
WITH RECOMPILE
Temporary objects are often created and destroyed at a high rate in production systems, so caching can be an important optimization.
Wednesday 15 August 2012
Temporary Table Caching in Stored Procedures
Introduction
Ask anyone what the primary advantage of temporary tables over table variables is, and the chances are they will say that temporary tables support statistics and table variables do not.
This is true, of course. The indexes that enforce PRIMARY KEY
and UNIQUE
constraints on table variables do not have populated statistics associated with them. Neither do any non-constraint table variable indexes (using inline index definitions, available starting with SQL Server 2014). Finally, it is not possible to manually create statistics on table variables.
Intuitively, then, any query that has alternative execution plans to choose from ought to benefit from using a temporary table rather than a table variable. This is also true, up to a point.
Thursday 3 May 2012
Parallel Execution Plans Suck
Summary: A deep dive into SQL Server parallelism, and a potential performance problem with parallel plans that use TOP
.
Tuesday 1 May 2012
Query Optimizer Deep Dive - Part 4
This is the final part in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 2 Part 3
Beating the Optimizer
Our AdventureWorks test query produces an optimized physical execution plan that is quite different from the logical form of the query.
The estimated cost of the execution plan shown below is 0.0295 units.
Since we know the database schema very well, we might wonder why the optimizer did not choose to use the unique nonclustered index on Name
in the Product
table to filter rows based on the LIKE
predicate.
Sunday 29 April 2012
Query Optimizer Deep Dive – Part 3
This is the third in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 2 Part 4
Storage of Alternative Plans
We saw in part 2 how optimizer rules are used to explore logical alternatives for parts of the query tree, and how implementation rules are used to find physical operations to perform each logical steps.
To keep track of all these options, the cost-based part of the SQL Server query optimizer uses a structure called the Memo. This structure is part of the Cascades general optimization framework developed by Goetz Graefe.
Saturday 28 April 2012
Query Optimizer Deep Dive – Part 2
This is the second in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 3 Part 4
Cost-Based Optimization Overview
The input to cost-based optimization is a tree of logical operations produced by the previous optimization stages discussed in part one.
Cost-based optimization takes this logical tree, explores logical alternatives (different logical tree shapes that will always produce the same results), generates physical implementations, assigns an estimated cost to each, and finally chooses the cheapest physical option overall.
The goal of cost-based optimization is not to find the best possible physical execution plan by exploring every possible alternative. Rather, the goal is to find a good plan quickly.
Query Optimizer Deep Dive - Part 1
This is the first in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 2 Part 3 Part 4
Introduction
The motivation behind writing these sessions is finding that relatively few people have a good intuition for the way the optimizer works. This is partly because the official documentation is rather sparse, and partly because what information is available is dispersed across many books and blog posts.
The content presented here is very much geared to my preferred way of learning. It shows the concepts in what seems to me to be a reasonably logical sequence, and then provides tools to enable the interested reader to explore further, if desired.
Monday 12 March 2012
Fun with Scalar and Vector Aggregates
There are interesting things to be learned from even the simplest queries.
For example, imagine you are asked to write a query that lists AdventureWorks product names, where the product has at least one entry in the transaction history table, but fewer than ten.
Wednesday 18 January 2012
Dynamic Seeks and Hidden Implicit Conversions
A LIKE
predicate with only a trailing wildcard can usually use an index seek, as the following AdventureWorks sample database query shows:
SELECT
P.[Name]
FROM Production.Product AS P
WHERE
P.[Name] LIKE N'D%';
Friday 23 December 2011
Forcing a Parallel Query Execution Plan
This article is for SQL Server developers who have experienced the special kind of frustration that only comes from spending hours trying to convince the query optimizer to generate a parallel execution plan.
This situation often occurs when making an apparently innocuous change to the text of a moderately complex query — a change which somehow manages to turn a parallel plan that executes in ten seconds, into a five-minute serially-executing monster.
Tuesday 6 December 2011
SQL Server Optimizer Bug with JOIN and GROUP BY
I came across a SQL Server optimizer bug recently that made me wonder how on earth I never noticed it before.
As the title of this post suggests, the bug occurs in common JOIN
and GROUP BY
queries. While it does not cause incorrect results to be returned, it will often cause a poor query plan to be selected by the optimizer.
If you are just interested in the bug itself, you will find a description in the section headed “the bug revealed”. It relates to cardinality estimation for serial partial aggregates.
As the regular reader will be expecting though, I am going to work up to it with a bit of background. The lasting value of this post (once the bug is fixed) is in the background details anyway.
Sunday 4 December 2011
Is Distinct Aggregation Still Considered Harmful?
Back in 2008, Marc Friedman of the SQL Server Query Processor Team wrote a blog entry entitled “Distinct Aggregation Considered Harmful”.
Marc shows a way to work around the poor performance that often results simply from adding the keyword DISTINCT
to an otherwise perfectly reasonable aggregate function in a query.
This post is an update to that work, presenting a query optimizer enhancement in SQL Server 2012 that reduces the need to perform the suggested rewrite manually.
Wednesday 21 September 2011
Finding the Statistics Used to Compile an Execution Plan
In this post, I show you how to determine the statistics objects used by the query optimizer in producing an execution plan.
Note: This technique only applies to queries compiled using the original (70) cardinality estimation model.
Tuesday 30 August 2011
Can a SELECT query cause page splits?
The SQL Server documentation has this to say about page splits:
When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.
Given that, how can a SELECT
statement be responsible for page splits?
Well, I suppose we could SELECT
from a function that adds rows to a table variable as part of its internal implementation, but that would clearly be cheating, and no fun at all from a blogging point of view.
Tuesday 9 August 2011
SQL Server, Seeks, and Binary Search
The following table summarizes the results from my last two articles, Enforcing Uniqueness for Performance and Avoiding Uniqueness for Performance. It shows the CPU time used when performing 5 million clustered index seeks into a unique or non-unique index:
In test 1, making the clustered index unique improved performance by around 40%.
In test 2, making the same change reduced performance by around 70% (on 64-bit systems – more on that later).
Thursday 4 August 2011
Avoiding Uniqueness for Performance
In my last post, Enforcing Uniqueness for Performance, I showed how using a unique index could speed up equality seeks by around 40%.
Friday 29 July 2011
Enforcing Uniqueness for Performance
A little while back, I posted a short series on seeks and scans:
One of the things I highlighted in the middle post was the difference between a singleton seek and a range scan:
-
A singleton equality seek always retrieves exactly one row, and is guaranteed to do so because a unique index exists to enforce it.
-
A range scan seeks down the B-tree to a starting (or ending) point, and scans forward (or backward) from that point using the next or previous page pointers.
Today’s short post shows how much faster a singleton seek is, compared with a range scan, even when both return exactly the same number of records.
Tuesday 19 July 2011
Join Performance, Implicit Conversions, and Residuals
Introduction
You probably already know that it’s important to be aware of data types when writing queries, and that implicit conversions between types can lead to poor query performance.
Some people have gone so far as to write scripts to search the plan cache for CONVERT_IMPLICIT
elements, and others routinely inspect plans for that type of thing when tuning.
Now, that’s all good, as far as it goes. It may surprise you to learn that not all implicit conversions are visible in query plans, and there are other important factors to consider too.
Thursday 7 July 2011
Bitmap Magic (or… how SQL Server uses bitmap filters)
Question
Can a parallel query use less CPU than the same serial query, while executing faster?
The answer is yes! To demonstrate, I’ll use the following two (heap) tables, each containing a single column typed as integer
:
Saturday 2 July 2011
Undocumented Query Plans: The ANY Aggregate
As usual, here’s a sample table:
CREATE TABLE #Example
(
pk numeric IDENTITY PRIMARY KEY NONCLUSTERED,
col1 sql_variant NULL,
col2 sql_variant NULL,
thing sql_variant NOT NULL,
);
Some sample data:
And an index that will be useful shortly:
CREATE INDEX nc1
ON #Example
(col1, col2, thing);
There’s a complete script to create the table and add the data at the end of this post. There’s nothing special about the table or the data (except that I wanted to have some fun with values and data types).
Wednesday 22 June 2011
Undocumented Query Plans: Equality Comparisons
The diagram below shows two data sets, with differences highlighted:
To find changed rows using T-SQL, we might write a query like this:
The logic is clear: Join rows from the two sets together on the primary key column, and return rows where a change has occurred in one or more data columns.
Unfortunately, this query only finds one of the expected four rows:
The problem is that our query does not correctly handle NULLs.
Saturday 5 March 2011
How Parallelism Works in SQL Server
You might have noticed that January was a quiet blogging month for me.
Part of the reason was that I was working on an article for Simple Talk, looking at how parallel query execution really works. The first part is published today at:
Understanding and Using Parallelism in SQL Server.
This introductory piece is not quite as technical as normal, but I hope there be enough interesting material there to make it worth a read.
© Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi