About This Blog

Including my content from SQLBlog.com and some from SQLPerformance.com
Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Monday 13 November 2023

Why Batch Mode Sort Spills Are So Slow

Why Batch Mode Sort Spills Are So Slow

Batch mode sorting was added to SQL Server in the 2016 release under compatibility level 130. Most of the time, a batch mode sort will be much faster than the row mode equivalent.

This post is about an important exception to this rule, as recently reported by Erik Darling (video).

No doubt you’ll visit both links before reading on, but to summarize, the issue is that batch mode sorts are very slow when they spill—much slower than an equivalent row mode sort.

This also seems like a good opportunity to write down some sorting details I haven’t really covered before. If you’re not interested in those details and background to the current issue, you can skip down to the section titled, “Erik’s Demo”.

Friday 20 October 2023

Fast Key Optimization for Row Mode Sorts

Fast Key Optimization for Row Mode Sorts

SQL Server row-mode sorts generally use a custom implementation of the well-known merge sort algorithm to order data.

As a comparison-based algorithm, this performs a large number of value comparisons during sorting—usually many more than the number of items to sort.

Although each comparison is typically not expensive, even moderately sized sorting can involve a very large number of comparisons.

SQL Server can be called upon to sort a variety of data types. To facilitate this, the sorting code normally calls out to a specific comparator to determine how two compared values should sort: lower, higher, or equal.

Although calling comparator code has low overhead, performing enough of them can cause noticeable performance differences.

To address this, SQL Server has always (since at least version 7) supported a fast key optimization for simple data types. This optimization performs the comparison using highly optimized inline code rather than calling out to a separate routine.

Tuesday 30 August 2022

Reducing Contention on the NESTING_TRANSACTION_FULL latch

Reducing Contention on the NESTING_TRANSACTION_FULL latch

Each additional worker thread in a parallel execution plan executes inside a nested transaction associated with the single parent transaction.

Parallel worker access to shared parent transaction structures is protected by a latch. A NESTING_TRANSACTION_READONLY latch is used for a read-only transaction. A NESTING_TRANSACTION_FULL latch is used if the transaction has modified the database.

This design has its roots in SQL Server 7, where read-only query parallelism was introduced. SQL Server 2000 built on this with parallel index builds, which for the first time allowed multiple threads to cooperate to change a persistent database structure. Many improvements have followed since then, but the fundamental parent-child transaction design remains today.

Though lightweight, a latch can become a point of contention when requested sufficiently frequently in incompatible modes by many different threads. Some contention on shared resources is to be expected; it becomes a problem when latch waits start to affect CPU utilisation and throughput.

Thursday 18 November 2021

Be Careful with LOBs and OPTION (RECOMPILE)

Be Careful with LOBs and OPTION (RECOMPILE)

It sometimes makes sense to add OPTION (RECOMPILE) to a query. Typically this will be when:

  • A good enough plan for the query is very sensitive to one or more parameters
  • No good single value exists for the parameter to use in a hint
  • Optimize for unknown doesn’t give a good result
  • The plan might be expected to change over time
  • The cost of recompiling the statement is much less than the expected execution time
  • Recompiling every time is very likely to save more time and resources than it costs overall

All that is fairly well-known. The point of this short post is to draw your attention to another side-effect of adding OPTION (RECOMPILE) — the parameter embedding optimization (PEO).

Thursday 8 October 2020

Closest Match with Sort Rewinds

Closest Match with Sort Rewinds

In When Do SQL Server Sorts Rewind? I described how most sorts can only rewind when they contain at most one row. The exception is in-memory sorts, which can rewind at most 500 rows and 16KB of data.

These are certainly tight restrictions, but we can still make use of them on occasion.

To illustrate, I am going reuse a demo Itzik Ben-Gan provided in part one of his Closest Match series, specifically solution 2 (modified value range and indexing).

As Itzik’s title suggests, the task is to find the closest match for a value in one table in a second table.

As Itzik describes it:

The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.val and T1.val is the lowest. In case of ties (multiple matching rows in T2), match the top row based on val ascending, keycol ascending order.

That is, the row with the lowest value in the val column, and if you still have ties, the row with the lowest keycol value. The tiebreaker is used to guarantee determinism.

Tuesday 4 August 2020

SQL Server 2019 Aggregate Splitting

SQL Server 2019 Aggregate Splitting

The SQL Server 2019 query optimizer has a new trick available to improve the performance of large aggregations. The new exploration abilities are encoded in two new closely-related optimizer rules:

  • GbAggSplitToRanges
  • SelOnGbAggSplitToRanges

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Saturday 24 August 2019

Batch Mode Bitmap Demos

Batch Mode Bitmap Demos

This is a companion post to my main article Batch Mode Bitmaps in SQL Server. This post provides demos and illustrations to supplement the technical article.

The scripts presented here were run on SQL Server 2017 CU 16.

Tuesday 2 May 2017

SQL Server Temporary Object Caching

SQL Server Temporary Object Caching

Creating a table is a relatively resource-intensive and time-consuming operation. The server must locate and allocate storage space for the new data and index structures and make the corresponding entries in multiple system metadata tables. All this work has to be done in ways that will always work correctly under high concurrency, and which meet all of the ACID guarantees expected of a relational database.

In SQL Server, this means taking the right kinds of locks and latches, in the correct sequence, while also ensuring that detailed transaction log entries are safely committed to persistent storage in advance of any physical changes to the database. These log entries ensure the system can bring the database back to a consistent state in the event of a transaction rollback or system crash.

Dropping a table is a similarly expensive operation. Luckily, most databases do not create or drop tables with any great frequency. The obvious exception to this is the system database tempdb. This single database contains the physical storage, allocation structures, system metadata, and transaction log entries for all temporary tables and table variables across the entire SQL Server instance.

It is in the nature of temporary tables and table variables to be created and dropped much more frequently than other database object types. When this naturally high frequency of creation and destruction is combined with the concentrating effect of all temporary tables and table variables being associated with a single database, it is hardly surprising that contention can arise in the allocation and metadata structures of the tempdb database.

Monday 17 June 2013

Improving Partitioned Table Join Performance

Improving Partitioned Table Join Performance

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example of that, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing.

Thursday 4 April 2013

Optimizer Limitations with Filtered Indexes

Optimizer Limitations with Filtered Indexes

One of the filtered index use cases mentioned in the product documentation concerns a column that contains mostly NULL values. The idea is to create a filtered index that excludes the NULLs, resulting in a smaller nonclustered index that requires less maintenance than the equivalent unfiltered index.

Another popular use of filtered indexes is to filter NULLs from a UNIQUE index, giving the behaviour users of other database engines might expect from a default UNIQUE index or constraint: Uniqueness enforced only for non-NULL values.

Unfortunately, the query optimizer has limitations where filtered indexes are concerned. This post looks at a couple of less well-known examples.

Thursday 21 February 2013

Halloween Protection – The Complete Series

Halloween Protection – The Complete Series

I have written a four-part series on the Halloween Problem.

Some of you will never have heard about this issue. Those that have might associate it only with T-SQL UPDATE queries. In fact, the Halloween Problem affects execution plans for INSERT, UPDATE, DELETE and MERGE statements.

This is a topic I have been meaning to write about properly for years, ever since I read Craig Freedman’s 2008 blog post on the topic, which ended with the cryptic comment:

“…although I’ve used update statements for all of the examples in this post, some insert and delete statements also require Halloween protection, but I’ll save that topic for a future post.”

That future post never materialized, so I thought I would have a go. The four parts of the series are summarized and linked below, I hope you find the material interesting.

Wednesday 12 September 2012

Why Doesn’t Partition Elimination Work?

Why Doesn’t Partition Elimination Work?

Given a partitioned table and a simple SELECT query that compares the partitioning column to a single literal value, why does SQL Server read all the partitions when it seems obvious that only one partition needs to be examined?

Wednesday 5 September 2012

Compute Scalars, Expressions and Execution Plan Performance

Compute Scalars, Expressions and Execution Plan Performance

The humble Compute Scalar is one of the least well-understood of the execution plan operators, and usually the last place people look for query performance problems. It often appears in execution plans with a very low (or even zero) cost, which goes some way to explaining why people ignore it.

Compute Scalar

Some readers will already know that a Compute Scalar can contain a call to a user-defined function, and that any T-SQL function with a BEGIN…END block in its definition can have truly disastrous consequences for performance (see When is a SQL function not a function? by Rob Farley for details).

This post is not about those sorts of concerns.

Friday 31 August 2012

Deletes that Split Pages and Forwarded Ghosts

Deletes that Split Pages and Forwarded Ghosts

Can DELETE operations cause pages to split?

Yes. It sounds counter-intuitive on the face of it. Deleting rows frees up space on a page, and page splitting occurs when a page needs additional space. Nevertheless, there are circumstances when deleting rows causes them to expand before they can be deleted.

Thursday 3 May 2012

Parallel Execution Plans Suck

Parallel Execution Plans Suck

Summary: A deep dive into SQL Server parallelism, and a potential performance problem with parallel plans that use TOP.

Wednesday 18 January 2012

Dynamic Seeks and Hidden Implicit Conversions

Dynamic Seeks and Hidden Implicit Conversions

A LIKE predicate with only a trailing wildcard can usually use an index seek, as the following AdventureWorks sample database query shows:

SELECT 
    P.[Name]
FROM Production.Product AS P
WHERE 
    P.[Name] LIKE N'D%';

Sunday 4 December 2011

Is Distinct Aggregation Still Considered Harmful?

Is Distinct Aggregation Still Considered Harmful?

Back in 2008, Marc Friedman of the SQL Server Query Processor Team wrote a blog entry entitled “Distinct Aggregation Considered Harmful”.

Marc shows a way to work around the poor performance that often results simply from adding the keyword DISTINCT to an otherwise perfectly reasonable aggregate function in a query.

This post is an update to that work, presenting a query optimizer enhancement in SQL Server 2012 that reduces the need to perform the suggested rewrite manually.

Tuesday 30 August 2011

Can a SELECT query cause page splits?

Can a SELECT query cause page splits?

The SQL Server documentation has this to say about page splits:

When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.

Given that, how can a SELECT statement be responsible for page splits?

Well, I suppose we could SELECT from a function that adds rows to a table variable as part of its internal implementation, but that would clearly be cheating, and no fun at all from a blogging point of view.

Tuesday 9 August 2011

SQL Server, Seeks, and Binary Search

SQL Server, Seeks, and Binary Search

The following table summarizes the results from my last two articles, Enforcing Uniqueness for Performance and Avoiding Uniqueness for Performance. It shows the CPU time used when performing 5 million clustered index seeks into a unique or non-unique index:

Test summary

In test 1, making the clustered index unique improved performance by around 40%.

In test 2, making the same change reduced performance by around 70% (on 64-bit systems – more on that later).

Thursday 4 August 2011

Avoiding Uniqueness for Performance

Avoiding Uniqueness for Performance

In my last post, Enforcing Uniqueness for Performance, I showed how using a unique index could speed up equality seeks by around 40%.