SQL is a declarative language. We use SQL to write a logical query specification that defines the results we want. For example, we might write a query using either APPLY
or JOIN
that logically describes exactly the same results.
It is up to the query optimizer to find an efficient physical implementation of that logical requirement. SQL Server is free to choose any plan it likes, so long as the results are guaranteed to be the same as specified in the original SQL.
The optimizer is capable of transforming an apply to a join and vice versa. It generally tries to rewrite apply to join during initial compilation to maximize the searchable plan space during cost-based optimization. Having transformed an apply to a join early on, it may also consider a transformation back to an apply shape later on to assess the merits of e.g. an index loops join.
Physical Operators
The optimizer’s output may contain both apply and nested loops join physical operations. Both are shown in execution plans as a Nested Loops Join operator, but they have different properties:
- Apply
- The Nested Loops Join operator has Outer References. These describe parameter values passed from the outer (upper) side of the join to operators on the inner (lower) side of the join. The value of the each parameter may change on each iteration of the loop. The join predicate is evaluated (given the current parameter values) by one or more operators on the inner side of the join. The join predicate is not evaluated at the join itself.
- Join
- The Nested Loops Join operator has a Predicate (unless it is a cross join). It does not have any Outer References. The join predicate is always evaluated at the join operator.
Nested Loops Join Example
The following queries both produce the same Nested Loops Join plan whether APPLY
or JOIN
syntax is used in the SQL query specification:
DECLARE @T1 table (c1 integer NULL);
DECLARE @T2 table (c2 integer NULL);
INSERT @T1 (c1)
VALUES (1), (2), (3);
INSERT @T2 (c2)
VALUES (1), (2), (3);
-- Join syntax
SELECT
T1.c1,
T2.c2
FROM @T1 AS T1
JOIN @T2 AS T2
ON T2.c2 > T1.c1;
-- Apply syntax
SELECT
T1.c1,
CA.c2
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.c2
FROM @T2 AS T2
WHERE T2.c2 > T1.c1
) AS CA;
The queries both describe the same result set regardless of the contents of the tables. The execution plans are identical, and have the same QueryPlanHash
value.
In both cases, the query optimizer chose a Nested Loops Join rather than an Apply — the join operator has a Predicate, and no Outer References:
This plan executes as follows:
- For each row from table
@T2
:- For each row from table
@T1
:- Test the predicate
T2.c2 > T1.c1
at the join
- Test the predicate
- For each row from table
Join syntax optimization
Starting with the SQL JOIN
form of the query:
-- Join syntax
SELECT
T1.c1,
T2.c2
FROM @T1 AS T1
JOIN @T2 AS T2
ON T2.c2 > T1.c1;
We can see the input tree passed to the optimizer using trace flag 8606 (TF 3604 will also need to be on to direct the debug output shown to the SSMS messages tab):
*** Input Tree: ***
LogOp_Project QCOL: [T1].c1 QCOL: [T2].c2
LogOp_Select
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Const TI(bit,ML=1) XVAR(bit,Value=1)
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
AncOp_PrjList
This is logically a cross join of the two tables followed by a relational selection (filter) on the join predicate. The optimizer simplifies this via rule SELonJN
(relational selection on join) to an inner join:
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
The optimizer considers rewriting this logical join to a logical apply during cost-based optimization via the exploration rule JoinToIndexOnTheFly
. The replacement candidate features an inner apply instead of a join, and an inner-side Eager Index Spool (extract of trace flag 8619 output shown):
LogOp_Apply (x_jtInner)
Leaf-Op 3
LogOp_Spool Index on fly Eager
Leaf-Op 4
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
It also considers swapping the inner join inputs via JoinCommute
:
Apply Rule: JoinCommute - A JOIN B -> B JOIN A
LogOp_Join
Leaf-Op 4
Leaf-Op 3
Leaf-Op 2
Having generated a number of logical alternatives, the optimizer moves on to implementation using physical operators. The apply alternatives generated via ApplyToNL
and BuildSpool
include options for an eager spool, a lazy spool, and no spool at all:
-- Eager spool
PhyOp_Spool Index-on-fly EAGER
Leaf-Op 4
Leaf-Op 2
-- Logical Lazy spool
PhyOp_Apply (x_jtInner)
Leaf-Op 3
LogOp_Spool Lazy
Leaf-Op 6
-- Physical Lazy spool
PhyOp_Spool LAZY
Leaf-Op 6
-- No spool
PhyOp_Apply (x_jtInner)
Leaf-Op 3
Leaf-Op 6
The optimizer also considers implementing the logical join as a physical Nested Loops Join via implementation rule JNtoNL
(join to nested loops):
PhyOp_LoopsJoin x_jtInner
Leaf-Op 3
Leaf-Op 4
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
Notice the physical operator is PhyOp_LoopsJoin
rather than the PhyOp_Apply
seen previously. The optimizer is considering both options.
After costing the promising combinations of join type, join order, and spool type, the optimizer decides that the Nested Loops Join option is cheapest, with a total cost of 0.00657068 units.
The final optimizer output can be seen with TF 8607:
*** Output Tree: ***
PhyOp_LoopsJoin x_jtInner
PhyOp_Range TBL: @T2
PhyOp_Range TBL: @T1
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
After conversion to a form usable by the execution engine (TF 7352):
Inner Join Nested Loops (0)
[QCOL: [T1].c1 TI(int,Null,ML=4)]
[QCOL: [T2].c2 TI(int,Null,ML=4)]
Table Scan Table Scan (1)
[QCOL: [T2].c2 TI(int,Null,ML=4)]
Table Scan Table Scan (2)
[QCOL: [T1].c1 TI(int,Null,ML=4)]
Apply syntax optimization
Starting now with the APPLY
query form:
SELECT
T1.c1,
CA.c2
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.c2
FROM @T2 AS T2
WHERE T2.c2 > T1.c1
) AS CA;
The input tree now features a logical apply instead of the join we saw before:
LogOp_Project QCOL: [T1].c1 QCOL: [T2].c2
LogOp_Apply (x_jtInner)
LogOp_Get TBL: @T1
LogOp_Project
LogOp_Select
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
AncOp_PrjList
AncOp_PrjList
The ApplyHandler
simplification rule transforms this tree into a join (output from TF 8621 shown):
Rule applied: APPLY stack -> Join stack
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
This is exactly the same logical tree as when the query was written using JOIN
SQL syntax. Optimization continues the same as before with the same resulting Nested Loops Join plan, even though we started with APPLY
syntax.
Apply Example
We can modify the test script to produce an Apply plan (whether APPLY
or JOIN
syntax is used in the query specification) by adding an index to one of the tables:
DECLARE @T1 table (c1 integer NULL);
DECLARE @T2 table (c2 integer NULL INDEX i); -- New index
INSERT @T1 (c1)
VALUES (1), (2), (3);
INSERT @T2 (c2)
VALUES (1), (2), (3);
-- Apply syntax
SELECT T1.c1, CA.c2
FROM @T1 AS T1
CROSS APPLY
(
SELECT *
FROM @T2 AS T2
WHERE T2.c2 > T1.c1
) AS CA;
-- Join syntax
SELECT
T1.c1,
T2.c2
FROM @T1 AS T1
JOIN @T2 AS T2
ON T2.c2 > T1.c1;
Again, these queries describe the same result set and the execution plans are the same for both query forms.
This time, the query optimizer chooses Apply rather than Nested Loops Join, so the join has Outer References and no Predicate:
This plan executes as follows:
- For each row from table
@T1
- Return each row from table
@T2
that matches the predicateT2.c2 > T1.c1
. The predicate is evaluated by the Index Seek operator, not at the join.
- Return each row from table
The important difference is that the current value of column c1
is passed as an outer reference to the inner side of the join so the Index Seek can efficiently locate matching rows:
This is more efficient than testing all rows at the join operator.
Apply Syntax Optimization
Starting our analysis with the APPLY
query form:
SELECT T1.c1, CA.c2
FROM @T1 AS T1
CROSS APPLY
(
SELECT *
FROM @T2 AS T2
WHERE T2.c2 > T1.c1
) AS CA;
The input tree features a logical apply:
*** Input Tree: ***
LogOp_Project QCOL: [T1].c1 QCOL: [T2].c2
LogOp_Apply (x_jtInner)
LogOp_Get TBL: @T1
LogOp_Project
LogOp_Select
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
AncOp_PrjList
AncOp_PrjList
It is transformed to a join by ApplyHandler
as before:
Rule applied: APPLY stack -> Join stack
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
During cost-based optimization, the optimizer considers the same options as before, including:
JoinCommute
to swap the join input order.JoinToIndexOnTheFly
to convert the join to an apply with inner-side eager index spool.
The changed schema (new index on table @T2
) means a new rule JNtoIdxLookup
now matches the optimization tree. As the name suggests, this new exploration rule generates a logically-equivalent subtree by replacing a join with an apply plus inner-side index lookup:
Apply Rule: JNtoIdxLookup - JN -> IDX LOOKUP
PhyOp_Apply lookup TBL: @T2 (1) (x_jtInner)
Leaf-Op 3
LogOp_SelectIdx(1)
LogOp_GetIdx TBL: @T2 index 1
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
The optimizer also considers lazy spool and no spool options for the apply alternatives generated by JoinToIndexOnTheFly
, but the apply option generated by JNtoIdxLookup
turns out to be cheapest, with a final plan cost of 0.00657038 units:
*** Output Tree: ***
PhyOp_Apply lookup TBL: @T2 (1) (x_jtInner)
PhyOp_Range TBL: @T1
PhyOp_Range TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
The copy out tree is:
Inner Join Nested Loops (0)
[QCOL: [T1].c1 TI(int,Null,ML=4)]
[QCOL: [T2].c2 TI(int,Null,ML=4)]
Table Scan Table Scan (1)
[QCOL: [T1].c1 TI(int,Null,ML=4)]
Index Seek Index Seek (2) CP
[QCOL: [T2].c2 TI(int,Null,ML=4)]
Note that we started with SQL APPLY
and had a logical apply in the input tree. This was transformed into a join, and later converted to a physical apply with index lookup during cost-based optimization.
Join Syntax Optimization
Starting instead with the JOIN
SQL syntax:
SELECT
T1.c1,
T2.c2
FROM @T1 AS T1
JOIN @T2 AS T2
ON T2.c2 > T1.c1;
The input tree features a cross join and relational selection, which is simplified to an inner join by SELonJN
:
*** Input Tree: ***
LogOp_Project QCOL: [T1].c1 QCOL: [T2].c2
LogOp_Select
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Const TI(bit,ML=1) XVAR(bit,Value=1)
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
AncOp_PrjList
Rule applied: SEL JN -> JN
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: @T1
LogOp_Get TBL: @T2
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [T2].c2
ScaOp_Identifier QCOL: [T1].c1
This is exactly the same as the simplified tree for the APPLY
syntax after its apply was transformed to a join by ApplyHandler
. The remainder of the optimization process continues exactly as before, with the same physical apply plan chosen via the JNtoIdxLookup
implementation rule.
Summary
The written form of the SQL query can have an impact on the initial logical tree passed into the compilation and optimization process, but the optimizer can often convert an apply into a join, and can pretty much always consider an apply alternative to a join.
Whether APPLY
or JOIN
syntax is used in the source query does not necessarily determine whether the execution plan uses an apply (nested loops join with outer references) or a regular loops join (predicate evaluated at the join).
An apply plan may be more intuitive, but the correlations (outer references) mean the only available physical implementation has to be based on nested loops. A join without correlations can be implemented by nested loops, hash match, and sort-merge. The optimizer explores the relative merits of apply and join as it tries to find a reasonable execution plan in a sensible amount of time.
Very well explained, thanks!
ReplyDelete