Millisecond Improvements

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.

The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.

Well, I’ve found a few more options since my last post, and wanted to share my findings.

Setup

The procedure has some complex logic but only runs a few queries.

  • There are a few simple SELECT statements to populate some variables. These take a small percentage of the overall runtime.
  • There are two UPDATE statements, and we will run one or the other. Both join a table to a table variable; one has a second CTE doing some aggregation the other lacks. The majority of our time is spent running these UPDATEs.
  • An INSERT statement that takes place every time. This is to ensure that if we didn’t update a record because it didn’t exist, we make sure we insert the row. It’s very likely on a given run we will INSERT 0 rows.

Brainstorming

Since we 80% of our time in the UPDATE (I love Query Store), that’s the place with the most potential for gain.

But, on first look (first couple) it seems difficult to see room for improvement here. We’re doing index seeks with small row counts. The index scans are against memory optimized table variables, and you may notice they are cheaper than than the index seeks.

But, looking at plan one thing did draw my attention:

There’s a table spool, and following that I see the plan is updating an indexed view. Which we would do every time there’s an UPDATE. Hundreds of millions of times a day…huh. So, removing the index on that view would eliminate this entire middle branch from the plan.

The view is based on two columns that are the first two columns of the clustered PK of the underlying table. The view does some aggregation, but the difference between querying the view or the table is reading 1 row versus maybe 2 or 3 rows, most of the time. Dropping that index seems like a good thing to try.

And I did mention this in my last blog post, but we perform the INSERT statement every execution of the procedure, and we run the trigger on this table even if we inserted 0 rows. So, if we can detect whether the INSERT is needed, we can potentially skip the majority of the executions of the statement and the trigger.

The logic for the procedure uses a TVP and a couple of table variables, which isn’t optimal. SQL Server doesn’t have statistics on table variables, so it’s not able to make good estimates of how many rows are going to be returned (unless you are using table variable deferred compilation in SQL Server 2019). We could change these to temp tables, and see if we have better results. Hopefully, we’ll have a more stable plan across the many databases running it.

Both of the UPDATE statements have a bookmark lookup. We’ll be looking up only a few rows, but this could be a significant improvement for a query that takes so little time. Also, one of the UPDATE statements references the main table an additional time in its CTE. So we have two index seeks plus the key lookup. How much of our time is spent in the second access and the bookmark lookup?

Results

I’ve been working on releasing these changes individually, and the first one is complete. Removing the index on the view, resulted in a 17% reduction in the duration of the procedure (from 2.34ms to 1.94ms), and a 20% reduction in CPU. Come to think of it, not having to update that index would have helped with our INSERT statement as well.

I’ll update this post once I have details on the other changes.

If you liked this post, please follow me on twitter and contact me if you have questions.


Reducing Trigger Executions

I’ve never been a fan of triggers. I don’t like the idea of them adding an additional tax on every operation. It’s easy to forget they are even there, consuming your cycles. I’ve even seen a few nasty death-by-a-thousand-cuts scenarios with them. But I found something out this week that makes me like them even less.

I was tuning a procedure that runs 284 million times a day.

Over a number of servers and databases, but yes, that number is correct. It takes 2.5ms to run on average, with 1.0ms of CPU time. I’ll spare you the math, but that means over 3 cores of SQL Server are doing nothing but running this procedure 24/7/365. Anything we can do to improve this will be significant, even if we just shave off half a millisecond.

Small Improvements

The procedure is attempting to update or insert a few records based on a TVP input. Sometimes we run an UPDATE statement, but update the column to its current value based on one input parameter. Based on another parameter, we may filter out all rows and update nothing. But in both cases, even though we haven’t changed any data we still took the time to run the UPDATE statement. I’m unsure how often those two parameters match one of those two checks, but if we check them first we can entirely skip the UPDATE.

Like I said, every little bit helps.

After the UPDATE, there is an INSERT to make sure if we didn’t UPDATE the row because it doesn’t exist, we INSERT it. There are conditionals around the UPDATE, but not the INSERT. So, we run it every time we run the proc. 284 million times a day. Even if there’s nothing to insert.

Trigger Happy

The revelation is this:

“These triggers fire when any valid event fires, whether table rows are affected or not.”

Ouch. And I really wasn’t aware of this. So, we pay for the INSERT execution every time, even if we don’t need to insert anything. And the INSERT trigger on that table fires as well, trying to update another table even though the INSERTED and DELETED tables are empty. Charming.

But if our TVP had x items to update\insert and our UPDATE statement handled y of them, we only need to try the INSERT if y<x.

We won’t even need to query the underlying table or the TVP; just check the row counts.

We’re likely to update these records many times, and we’ll obviously only insert them once. Checking the execution counts for the INSERT and UPDATE triggers and subtracting the number of executions from this proc, it seems we UPDATE the table several hundred times for each INSERT.

Well then. I’ll look forward to seeing those executions drop shortly.

If you liked this post, please follow me on twitter or contact me if you have questions.

Slow Garbage Collection

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it. We’re scanning and returning 1.4 million rows from the first table, doing a key lookup on all of them. We scan another table to look up the retention period for the related account, as this database has information from multiple accounts, and return 8.4 million rows there. We join two massive record sets, then have a filter operator that gets us down to a more reasonable size.

In general, when running a complex query you want your most effective filter, that results in the fewest rows, to happen first. We want to get the result set small early, because every operation after becomes less expensive. Here’s part of our anonymized query:

So, Object5 is our main table. Column10 is a status column, and we compare that to a scalar value to make sure this record is ready for GC. That’s fine. The second is checking if the record is old enough to GC, and this is where things get interesting. The code here is odd; we’re subtracting a COALESCE of three values from a DATETIME variable, relying on behavior that subtracting an INT from a DATETIME subtracts that number of days from the DATETIME. There’s an implicit conversion, but it is on the variable side of the inequality. And the plan above appears to be our result.

So, Bad Date Math Code?

Seems like a good root cause, and I don’t like this lazy coding pattern, but let’s test it out with tables we can all look at. Would this be any better if we had written this to use the DATEADD function to do the math properly?

So, if we had an index on the OrderDate column, would it perform differently?

Apparently not. Same plan, same cost. But when you think about it, this tests the date math by subtracting an integer from the DATETIME provided by GETUTCDATE(). The original was subtracting a COALESCE of three values. One of those values was a float. Could the COALESCE or the resulting data type have made this more complicated?

Testing COALESCE

Running this, again we see a nice seek that reads and returns 100 rows. So the different data type and the COALESCE makes no difference.

Looking at the original query again, the first value isn’t a variable, it’s a column from a different table. We can’t filter by this column until we’ve read the other table, which affects our join order. But we have no criteria to seek the second table with.

Joined Filtering

One more test. Let’s see what the behavior looks like if we join to the Customers table to look for the RetentionPeriod. First, I’ll create and populate some data in that column:

I only populated a few of the records to better match the production issue; only some accounts have the value I’m looking for, hence the COALESCE.

Now we’re trying to filter records in SalesOrderHeader, based on the Customer’s RetentionPeriod. How does this perform?

 

Well, the row counts aren’t terrible, but we are scanning both tables. The optimizer opted to start with Customer table, which is much smaller. We’re not filtering on the date until the filter operator, after the merge join.

I’d be worried that with a larger batch size or tables that don’t line up for a merge join, we’d just end up doing a hash match. That would force us to scan the first table, and without any filter criteria that would be a lot of reads.

Solution

The solution I applied to my production query was to create a temp table that had the account ID and the related retention period. I joined this to my primary table, and the query that was taking 84 seconds was replaced with one that took around 20 milliseconds.

 

Here’s how I’d apply that thought to our example. I’ve created a temp table with CustomerID and its associated RetentionDate, so we could use that to search Sales.SalesOrderHeader.

You may have noticed my filter on the Sales.Customer table. In the live issue, the temp table had dozens of rows for dozens of accounts, not the tens of thousands I’d get from using all rows Sales.Customer in my example. So, I filtered it to get a similar size in my temp table.

Infinitely better. Literally in this case, since the ElapsedTime and ActualElapsedms indicators in the plan XML are all 0 milliseconds. I’m declaring victory.

If you liked this post, please follow me on twitter or contact me if you have questions.

Understanding Cost in SQL Server

Cost is an important concept in SQL Server. It is key in how plans are compared and chosen by the optimizer, and it can guide us to problem operators as we tune a query’s performance. It can also lead us astray if we follow it blindly. In this post, I want to explain what cost is and how we use it.

Many queries (that aren’t trivial) can be executed in a number of different ways. Each index on a table is a possible path for the optimizer to use, and statistics allow the optimizer to determine the cost of a given operation. SQL Server determines what potential plan it will use in large part based on cost. The optimizer isn’t exhaustive. It won’t compare all possible plans; that would take too long.

Let’s take a look at an example query from the AdventureWorks database.

There is a cost provided for each query relative to the batch; that can help you narrow down which statement is the issue if a large batch or procedure is performing poorly. This query is in the only one in the batch. 

This query hits 5 tables, and has a number of joins and other operations. Each operation has a cost displayed here as a percentage of the total query. You’ll notice one operation has a cost of 94%; let’s zoom in. 

This has a cost of 0.54456. This is broken further into an I/O Cost and a CPU Cost. There is also a Subtree Cost, which would include any operators that feed into this one, but in this case there aren’t any. The operator returned 234 rows, but it read over 31k. So the cost seems appropriate; we really are doing some work here. 

The number for cost is always presented without a unit, you may ask 0.54456 of what exactly? Pounds? Parsecs?

Calibration

The story as I recall was that cost was derived from how long it took an early developer of SQL Server to run a given operation on his desktop computer, in seconds. So initially cost was expressed in seconds, but that’s not the case anymore. It’s a more generic expression of how much work is involved in executing the operation.

Given the cost value is fairly generic, you need an idea of what is a cheap operator or an expensive one. Here’s one of the other index seeks from the query above.

So, we’re doing an index seek not a scan. We not only returned 1 row, we only read one row. In terms of reading a normal table, it’s not going to get any cheaper than this. This cost of about 0.003 is something you’ll see many times for that reason.

So, what constitutes an expensive query? That’s a matter of opinion. You could gauge this by the “Cost Threshold for Parallelism” setting in SQL Server. This is a server level setting that sets how expensive an operation has to be for SQL Server to consider going parallel to perform it. The default setting is 5, so you could argue that’s an expensive query. But this setting is probably quite low for current servers. I think at work, this setting is 100 for most of the servers I work on.

Keep in mind, the cost threshold is for the operation, not the entire plan. 

Cost isn’t exact

One thing to remember is that cost in SQL Server is always an estimate. This is a number SQL Server calculates when considering multiple potential plans to determine which would be the best. But the number of rows it expects a given operation to return or how many times that operation runs can be off. All of that is based on statistics.

It doesn’t then go back and update the cost number later if those numbers were incorrect. So while we can use the cost as an indicator of which query or operator we should focus on, don’t completely tunnel-vision that one thing.

I could talk at this point about estimated plans versus actual plans, but fortunately Grant Fritchey has already done so. The gist of his post is that an actual plan is one that has actual runtime metrics. For example, “Number of Rows Read”, “Actual Number of Rows”, and “Number of Executions” in the images above.

It’s helpful to have plans with these actual numbers; they can help you confirm if the costs look accurate. “Number of Rows Read” is the main data point I look at most of the time.

You may also find the following situation in plans as you look at them:

The optimizer estimated 887 or 888 rows for these operators, but the number of rows returned is much higher. So the cost of 72% for the one operator isn’t really accurate. That operator read and returned many more rows, as did the other related operators. If you saw an estimated plan without these runtime numbers, you may come away with a very different impression of how this query is running.

Conclusion

I’ve always felt cost is not well explained, so hopefully this post will help answer some questions. Understanding cost can be really helpful in troubleshooting poorly performing queries, but don’t focus solely on it when analyzing a problem.

If you liked this post, please follow me on twitter or contact me if you have questions.

The transitive Property

You may recall the transitive property from elementary school math class. It states:

The SQL Server optimizer can and will use this property, and it can lead to issues in your queries. When I’m writing a query, I have a clear idea of how I want it to operate. But using the transitive property, SQL Server has additional options one might not expect, and this may occasionally cause things to go awry. Consider this:

I’d expect the optimizer to seek the Orders table, then join to OrderLines. But since @OrderID is also equal to sol.OrderID, it could start the query there. There would be little difference; the OrderID column is indexed in both tables and neither table is that large. There’s only 3 OrderLines for this OrderID.

But what if:

  • The CustomerID was present in all tables. And was part of the primary key in each.
  • The data set was larger, with 100’s of millions of rows in our OrderLines table.
  • Statistics were last updated by an auto-update. A table of this size would have a very small sampling rate by default, <1%.
  • Our database has a large number of customers, and we’re running a query to get recent orders for a large customer that haven’t been picked yet.

We may end up with a query like this:

If our PK has the CustomerID first, we could use that to search either table. If we don’t have an index with the CustomerID and our dates on Orders, those fields will be in our predicate, not our seek predicate. With poor statistics, the optimizer may seek OrderLines first, returning all lines for that customer, before later joining an filtering it down. In this situation, that could be a mountain of reads.

Countering The transitive Property

So what if you experience or anticipate this issue with a query? I see a few approaches to prevent the bad plan from sneaking up on your server.

  • Join Hints or Force Order: The FORCE ORDER hint would direct the optimizer to hit the Orders table first. A join hint like INNER LOOP JOIN would set the join type, and also force the order. Either would results in seeking the Orders table before joining to OrderLines, preventing our worst case.
  • Index hints: If you hint the optimizer to use an index based on CustomerID and OrderDate, that would point it in the direction of using the Orders table first, avoiding the problem.
  • Better stats: An argument can definitely be made to not use auto-updated stats on a sufficiently large table. I’ve seen sampling rates below 0.1%, and mistakes can be made then. If you updated stats with a large sampling rate, the optimizer will have better information to work with, and that may avoid the worst case.

I thought this property of the optimizer was interesting when I first saw it in play, and I haven’t seen it referred to much. Hopefully this post will give you a little more insight into what the optimizer can do, and how we sometimes need to respond.

Reviewing the concept

A quick search will tell you that implicit conversions are pretty awful for performance, and in particular drive CPU usage. That’s not news. There is an aspect of this I think a lot of engineers don’t understand; why does it cause performance issues?

An implicit conversion occurs when you try to compare between or assign data across two different data types, without converting one yourself. If you used a CAST or CONVERT, it would be an explicit conversion. When SQL Server does it for you, it’s an implicit conversion, and these can have a real impact on your execution plans. Not all combinations of types can be converted implicitly, for a full list look here.

So let’s look at an example based on WideWorldImporters:

The Description column is an nvarchar type, but I’m comparing it to a string that is non-unicode. And for the sake of this example, I did create an index on this column.

And we see an implicit conversion. In this case, the optimizer converted our string to a nvarchar(4000) type to match the Description column. But in this case, the implicit conversion isn’t going to cause a problem. Converting one value isn’t that expensive; you can confirm that from the ~0.025 cost for this operation. We even used the index on the column.

The real problem comes when SQL Server converts the column to match the variable.

Reproducing the issue

I’ve seen implicit conversions many times, and have one example in my head I recall very clearly. So I was fairly astonished when I tried to reproduce this using WideWorldImporters. SQL Server tries very hard to convert the parameter or literal, rather than the column. It’s quite difficult to get it behave badly, in part because of the data types chosen for the tables. 

Microsoft’s documentation on data type precedence indicates that SQL Server will convert from one type to another based on the priority in the chart; but I have an example that seems to counter this.

The Quantity column is a int, which has a lower priority than our decimal type variable. So according to the chart, the column should be converted, meaning every value on every row; more on that later.

But looking at the plan, we don’t convert the Quantity column. Instead there is a scalar operator on the variable that appears to be obscuring the underlying conversion:

So, the behavior here doesn’t match expectations. In my testing, all the int\decimal\money types worked similarly, and the optimizer would convert the variable. One takeaway I had from writing this is that implicit conversion occur much more often than I thought, but SQL Server is smart enough to convert the variable unless it has no choice. You likely wouldn’t see the many plans that convert the column, because they aren’t causing a performance problem. Instead, you’ll see and remember the one that caused a massive issue.

another angle

An easy example would be to compare a varchar column to an nvarchar variable. This forces SQL Server’s hand, only in one direction. SQL can easily convert any varchar to an nvarchar, but the opposite is more problematic. If your nvarchar string contains kanji or something else that can’t be represented with a varchar string, the operation would fail. So, SQL Server would rather convert the varchar, even if that means converting a column.

Note, I had to change databases here, because there are no varchar columns in WideWorldImporters (thus preventing this convert).

Even though I’ve created an index on the AccountNumber column, this results in a scan.

Usually I’ve seen the CONVERT_IMPLICIT on the scan operator itself, but here we see it as part of the filter. The query only returns 1 row, but we read all 19820 rows on the table to fulfill this. Image the effect if this table had 10 million rows.

Why the Table Scan?

Reasonable question. SQL Server doesn’t really have a choice. The index is built on the AccountNumber, not the output of this function call:

So, SQL Server can’t search the index for this information; it’s not in the index. And SQL Server can’t really predict what the output of the function will be before it has converted the data, so it has to look at every AccountNumber value, convert it, then compare. The most expensive part isn’t the convert itself, but the table scan it forces.

So, pay attention to your data types when writing procedures and queries to avoid these issues. The effect of an implicit conversion on a column is much more pronounced the larger the table is, and it can easily drive your CPU to 100%.

Hidden Implicit Conversions

One of my attempts to create an implicit conversion worked, but didn’t show it clearly in the plan. I tried the following example using the sql_variant type:

The plan is similar to the last one, but we don’t see the CONVERT_IMPLICIT when we mouseover any of the three operators.

However, if you hit F4 and examine the Compute Scalar operator, you can see the definition of expression Expr1001 in the Properties window.

This clearly shows the column being converted, But it’s odd to need to do this to see the convert. 

This would be a good time to refer to a bit of brilliance Jonathan Kehayias posted quite some time ago. This script will allow you to see any implicit conversions on columns in the plan cache, along with the statement, plan, and other relevant details. This can be useful in a number of cases, but especially when the operation is hidden as in this example.

Also, Any Other Function CALL

This post is primarily about understanding how implicit conversions work and why the occur, but it bears mentioning that basically any function call wrapping a column you want to search on will work the same way. 

For example, using SUBSTRING on AccountNumber gives largely the same result:

If you need to use a function call in your WHERE, hopefully you have another clause that can do the majority of your filtering.

Happy New Year

Hope you had a great holiday season! I’ll keep up the posting going forward, so check back in, and reach out if you have any questions I can help answer.

How Relevant is the TOP operator?

I’ve explained what a blocking operator is and provided a few examples, but maybe this doesn’t seem important. It’s affecting the TOP operator, sure, but don’t people just use this to look at the TOP 1000 rows of their tables in SSMS?

The TOP operator is useful for many operations, especially in a large environment. Large operation can timeout or fail for a variety of reasons, consuming resources without providing the results you need. A small, batch-sized operation is more likely to succeed and tends to perform more consistently. Many maintenance operations make sense to run with a TOP operator, so we should make sure those operations aren’t stymied by blocking operators. Some examples:

  • Garbage collection on a table with many millions rows. You want this to perform quickly, but you really can’t afford for this to time out (whatever that timeout period may be). We can limit how long this runs by GC’ing a small batch at a time, but this can be hampered badly by an extra sort or a hash join.
  • Archiving data applies for the same reasoning as GC. If you archive data to another table\database\server, you’ll want to keep your operation small enough to manage.
  • Backfilling a new column. If the existing table is large, you can’t just UPDATE the whole table; you’ll lock the table and block all your users. A batched UPDATE in a loop or in a scheduled process can resolve this without causing an outage.
  • GDPR is here, and CCPA is coming. You may need to anonymize data across many related tables. We need this to perform well to cleanse our existing data, and we’ll continue running this process going forward.
  • Queries producing potentially large results to your application may need to be batched as well. If this times out, you’re still wasting a lot of time and reads, even if no data is changed.

Out the Window

I examined one process recently that was similar to this query, causing a GC operation to time out.

It was a SELECT statement, and it inserted the important fields into a temp table, and ran DELETEs against multiple tables based on the contents of the temp table. But it was the initial SELECT that had a poor plan and caused the timeout.

I quickly saw a SORT in the execution plan and wondered why. The actual query didn’t have an ORDER BY clause. But it did have a ROW_NUMBER OVER in the select list; took me a minute to notice that.

But was the sorting necessary? “We need to delete really old records in this table, but it’s vitally important that we delete them in the order they were originally inserted!”

It seemed a poor reason to sort a table with tens of millions of rows. Coupled with the very small batch size, we were doing an extraordinary amount of work to get rid of a few rows. So what if we commented the ROW_NUMBER and the ORDER BY out?

Even though the new plan has a scan operator, we only read 110 rows from it because we are using the TOP operator properly. Note the row counts from the first plan again. We have 479 thousand rows going through multiple operators in the first, but only 100 per operation in the second.

Avoid one Blocking Operator, find another

Here’s an example from recent work. I was looking at a query in a GC that was populating a temp table to use for later DELETEs. I was anticipating the optimizer to try a hash match join, so I used an INNER LOOP JOIN hint to avoid that blocking operator. The results were quite unpleasant, as you can see in this anonymized plan.

So, I avoided the hash match join, but the SQL Server optimizer didn’t see it the way I did. The first table is a temp table, Object6, which we are joining to a normal table, Object11. But this plan includes a table spool that not only forces us to read all 587741 rows in the seek against our table, it seems to create a cross join in memory between the results of the clustered index scan on the temp table and the clustered index seek against the base table (538 * 587741 = 316204658).

That hint obviously wouldn’t work. I reversed the order of the tables, then removed the hint altogether, giving the following:

 

2000 rows returned across the board. There is a SORT operation, but it’s after the TOP so there’s no harm.. Our results are being sorted before inserting them into a second temp table. And a much more performant query, taking ~8 ms instead of 75000 ms.

Sort out blocking operators

Hopefully this has been informative. I honestly wasn’t aware of blocking operators until a month or two ago. That’s the frustrating and interesting thing of SQL Server sometimes; you can work on this as long as I have and, even putting new development aside, there’s always more to learn.

Hope this helps you optimize some of your own operations.

It seems so simple

The TOP operator seems pretty straightforward. “Hey SQL Server, give me the first 100 rows that match this criteria, then stop.” But when certain operations get involved it can go sideways.

Let’s start with a simple example in the WorldWideImporters database.

The query here is simple, and we can see the TOP operator only returns 100 rows, but so does the index seek underneath. The way this works is that the TOP asks the operation connected to it for 1 row, and keeps asking until it has 100 or the operation below can’t find anymore rows that match.

 

Because of the WHERE clause I chose, we actually had to read 943 rows in the table to get 100 that matched. At that point the TOP stopped asking for more rows.

If the TOP operator kept asking for more rows and only displayed the first 100, that would have been a waste of effort. Still, we’re only querying 1 index here.

Let’s add a wrinkle

Now we have a key lookup, and we can follow the flow of what happened. We read and returned 270 rows from the index seek. There’s a nested loops operator connecting us to our key lookup, and that returned only 100 rows. So, we had to read 270 rows from the index seek to find 100 rows that met all our filters. 

Ultimately the TOP operator was asking for 1 row repeatedly until it met its quota, and then it stopped asking for more. As expected. 

What’s a blocking operator?

So, with another change we’ll see one of those operators I mentioned earlier.

Here, we’re seeking based on a range of OrderIDs, but getting results sorted in a different order. We’re reading many more records from the index and the key lookup (where the estimate is wayyy off). We don’t need to read this much to get our results, because we returned 2160 rows from the key lookup and the nested loops join, but then we reduce when we hit the sort.

It makes sense that you can’t return the top 100 rows that meet this criteria, until you’ve seen them all. Well, unless the data is in the order you want, and you can just seek it that way from the index. Our previous query had an ORDER BY clause but no sort operation because our sort matched our range seek.

Sort is a blocking operator. Don’t feel bad if you haven’t heard of the term; I’ve been working with SQL Server for 15 years, and I’m sure I never heard the term until the incomparable Grant Fritchey mentioned it while he was lecturing at my place of employment.

So sorts and several other types of operators (eager spools, remote query\scan\etc, hash match joins, and more) will block the normal flow and gather all their results before passing any rows on. The hash match join only blocks while building its hash table from the first input, before probing the second.

Let’s hash it out

We filtered on OrderID < 5000, and we read 4999 rows from the build table. So we read everything that fit that criteria; we didn’t stop early because we had our 100 rows. So, definitely blocking behavior.

Then we probe the second table and return 900 rows, and they pass through the hash match. The results get reduced to 100 by the TOP operator. 

Why 900 rows from the OrderLines table, not 100? That’s less than clear. As I vary the result set or the TOP size, I get a number of behaviors using different indexes and returning batches of various sizes. It appears the probe may be trying to do a batch of rows at a time, or it may be related to the memory allocated. (If I can get a clearer answer to this aspect, I’ll update the post).

Update: I consulted with my colleague, the superb Kevin Feasel, who suggested this was operating in batch mode. I was testing this using SQL Server 2019, and batch mode on rowstore is a new feature that everyone should be aware of.

I thought I had ruled this out however, and indeed the probe operation was in row mode:

However, the hash match operation was not!

So the hash match requested a batch of 900 rows, which is why we saw the unexpected number of rows.

If you try this in SQL Server 2019, you may see different behaviors as you vary the result set (when I removed LastEditedWhen from the result set, it changed to row mode and only returned 100 rows) or the TOP size (TOP 1000 dropped it back to row mode). I also saw some variation with the index used against OrderLines, including the columnstore index.

Summing up

These have been relatively simple examples of the TOP operator in action, and how it interacts with other operators. In my next post, I’ll provide some more complicated plans, and discuss how we can keep our TOPs in top shape.