Understanding Cost in SQL Server

Cost is an important concept in SQL Server. It is key in how plans are compared and chosen by the optimizer, and it can guide us to problem operators as we tune a query’s performance. It can also lead us astray if we follow it blindly. In this post, I want to explain what cost is and how we use it.

Many queries (that aren’t trivial) can be executed in a number of different ways. Each index on a table is a possible path for the optimizer to use, and statistics allow the optimizer to determine the cost of a given operation. SQL Server determines what potential plan it will use in large part based on cost. The optimizer isn’t exhaustive. It won’t compare all possible plans; that would take too long.

Let’s take a look at an example query from the AdventureWorks database.

SELECT * FROM Sales.vSalesPersonSalesByFiscalYears 
WHERE SalesPersonID=278

There is a cost provided for each query relative to the batch; that can help you narrow down which statement is the issue if a large batch or procedure is performing poorly. This query is in the only one in the batch. 

This query hits 5 tables, and has a number of joins and other operations. Each operation has a cost displayed here as a percentage of the total query. You’ll notice one operation has a cost of 94%; let’s zoom in. 

This has a cost of 0.54456. This is broken further into an I/O Cost and a CPU Cost. There is also a Subtree Cost, which would include any operators that feed into this one, but in this case there aren’t any. The operator returned 234 rows, but it read over 31k. So the cost seems appropriate; we really are doing some work here. 

The number for cost is always presented without a unit, you may ask 0.54456 of what exactly? Pounds? Parsecs?


The story as I recall was that cost was derived from how long it took an early developer of SQL Server to run a given operation on his desktop computer, in seconds. So initially cost was expressed in seconds, but that’s not the case anymore. It’s a more generic expression of how much work is involved in executing the operation.

Given the cost value is fairly generic, you need an idea of what is a cheap operator or an expensive one. Here’s one of the other index seeks from the query above.

So, we’re doing an index seek not a scan. We not only returned 1 row, we only read one row. In terms of reading a normal table, it’s not going to get any cheaper than this. This cost of about 0.003 is something you’ll see many times for that reason.

So, what constitutes an expensive query? That’s a matter of opinion. You could gauge this by the “Cost Threshold for Parallelism” setting in SQL Server. This is a server level setting that sets how expensive an operation has to be for SQL Server to consider going parallel to perform it. The default setting is 5, so you could argue that’s an expensive query. But this setting is probably quite low for current servers. I think at work, this setting is 100 for most of the servers I work on.

Keep in mind, the cost threshold is for the operation, not the entire plan. 

Cost isn’t exact

One thing to remember is that cost in SQL Server is always an estimate. This is a number SQL Server calculates when considering multiple potential plans to determine which would be the best. But the number of rows it expects a given operation to return or how many times that operation runs can be off. All of that is based on statistics.

It doesn’t then go back and update the cost number later if those numbers were incorrect. So while we can use the cost as an indicator of which query or operator we should focus on, don’t completely tunnel-vision that one thing.

I could talk at this point about estimated plans versus actual plans, but fortunately Grant Fritchey has already done so. The gist of his post is that an actual plan is one that has actual runtime metrics. For example, “Number of Rows Read”, “Actual Number of Rows”, and “Number of Executions” in the images above.

It’s helpful to have plans with these actual numbers; they can help you confirm if the costs look accurate. “Number of Rows Read” is the main data point I look at most of the time.

You may also find the following situation in plans as you look at them:

The optimizer estimated 887 or 888 rows for these operators, but the number of rows returned is much higher. So the cost of 72% for the one operator isn’t really accurate. That operator read and returned many more rows, as did the other related operators. If you saw an estimated plan without these runtime numbers, you may come away with a very different impression of how this query is running.


I’ve always felt cost is not well explained, so hopefully this post will help answer some questions. Understanding cost can be really helpful in troubleshooting poorly performing queries, but don’t focus solely on it when analyzing a problem.

If you liked this post, please follow me on twitter or contact me if you have questions.

The transitive Property

You may recall the transitive property from elementary school math class. It states:

If A = B, and B = C, then A = C

The SQL Server optimizer can and will use this property, and it can lead to issues in your queries. When I’m writing a query, I have a clear idea of how I want it to operate. But using the transitive property, SQL Server has additional options one might not expect, and this may occasionally cause things to go awry. Consider this:

DECLARE @OrderID INT = 110001;

FROM Sales.Orders so
JOIN Sales.OrderLines sol
	ON sol.OrderID = so.OrderID
	so.OrderID = @OrderID;

I’d expect the optimizer to seek the Orders table, then join to OrderLines. But since @OrderID is also equal to sol.OrderID, it could start the query there. There would be little difference; the OrderID column is indexed in both tables and neither table is that large. There’s only 3 OrderLines for this OrderID.

But what if:

  • The CustomerID was present in all tables. And was part of the primary key in each.
  • The data set was larger, with 100’s of millions of rows in our OrderLines table.
  • Statistics were last updated by an auto-update. A table of this size would have a very small sampling rate by default, <1%.
  • Our database has a large number of customers, and we’re running a query to get recent orders for a large customer that haven’t been picked yet.

We may end up with a query like this:

	@CustomerID INT = 55;

FROM Sales.Orders so
JOIN Sales.OrderLines sol
	ON sol.OrderID = so.OrderID
	AND sol.CustomerID = so.CustomerID
	so.CustomerID = @CustomerID
	AND so.OrderDate > DATEADD(DAY,-90,GETDATE())
	AND so.PickingCompletedWhen IS NULL;

If our PK has the CustomerID first, we could use that to search either table. If we don’t have an index with the CustomerID and our dates on Orders, those fields will be in our predicate, not our seek predicate. With poor statistics, the optimizer may seek OrderLines first, returning all lines for that customer, before later joining an filtering it down. In this situation, that could be a mountain of reads.

Countering The transitive Property

So what if you experience or anticipate this issue with a query? I see a few approaches to prevent the bad plan from sneaking up on your server.

  • Join Hints or Force Order: The FORCE ORDER hint would direct the optimizer to hit the Orders table first. A join hint like INNER LOOP JOIN would set the join type, and also force the order. Either would results in seeking the Orders table before joining to OrderLines, preventing our worst case.
  • Index hints: If you hint the optimizer to use an index based on CustomerID and OrderDate, that would point it in the direction of using the Orders table first, avoiding the problem.
  • Better stats: An argument can definitely be made to not use auto-updated stats on a sufficiently large table. I’ve seen sampling rates below 0.1%, and mistakes can be made then. If you updated stats with a large sampling rate, the optimizer will have better information to work with, and that may avoid the worst case.

I thought this property of the optimizer was interesting when I first saw it in play, and I haven’t seen it referred to much. Hopefully this post will give you a little more insight into what the optimizer can do, and how we sometimes need to respond.

Reviewing the concept

A quick search will tell you that implicit conversions are pretty awful for performance, and in particular drive CPU usage. That’s not news. There is an aspect of this I think a lot of engineers don’t understand; why does it cause performance issues?

An implicit conversion occurs when you try to compare between or assign data across two different data types, without converting one yourself. If you used a CAST or CONVERT, it would be an explicit conversion. When SQL Server does it for you, it’s an implicit conversion, and these can have a real impact on your execution plans. Not all combinations of types can be converted implicitly, for a full list look here.

So let’s look at an example based on WideWorldImporters:

--CREATE INDEX IX_Sales_OrderLines_Description ON Sales.OrderLines(Description);

SELECT sol.OrderLineID, sol.Description
FROM Sales.OrderLines sol
WHERE sol.Description = 'Plush shark slippers (Gray) XL'

The Description column is an nvarchar type, but I’m comparing it to a string that is non-unicode. And for the sake of this example, I did create an index on this column.

And we see an implicit conversion. In this case, the optimizer converted our string to a nvarchar(4000) type to match the Description column. But in this case, the implicit conversion isn’t going to cause a problem. Converting one value isn’t that expensive; you can confirm that from the ~0.025 cost for this operation. We even used the index on the column.

The real problem comes when SQL Server converts the column to match the variable.

Reproducing the issue

I’ve seen implicit conversions many times, and have one example in my head I recall very clearly. So I was fairly astonished when I tried to reproduce this using WideWorldImporters. SQL Server tries very hard to convert the parameter or literal, rather than the column. It’s quite difficult to get it behave badly, in part because of the data types chosen for the tables. 

Microsoft’s documentation on data type precedence indicates that SQL Server will convert from one type to another based on the priority in the chart; but I have an example that seems to counter this.

DECLARE @dec decimal(18,2) = 27.0;

	sol.OrderLineID, sol.Quantity
FROM Sales.OrderLines sol
	sol.Quantity = @dec;

The Quantity column is a int, which has a lower priority than our decimal type variable. So according to the chart, the column should be converted, meaning every value on every row; more on that later.

But looking at the plan, we don’t convert the Quantity column. Instead there is a scalar operator on the variable that appears to be obscuring the underlying conversion:

So, the behavior here doesn’t match expectations. In my testing, all the int\decimal\money types worked similarly, and the optimizer would convert the variable. One takeaway I had from writing this is that implicit conversion occur much more often than I thought, but SQL Server is smart enough to convert the variable unless it has no choice. You likely wouldn’t see the many plans that convert the column, because they aren’t causing a performance problem. Instead, you’ll see and remember the one that caused a massive issue.

another angle

An easy example would be to compare a varchar column to an nvarchar variable. This forces SQL Server’s hand, only in one direction. SQL can easily convert any varchar to an nvarchar, but the opposite is more problematic. If your nvarchar string contains kanji or something else that can’t be represented with a varchar string, the operation would fail. So, SQL Server would rather convert the varchar, even if that means converting a column.

Note, I had to change databases here, because there are no varchar columns in WideWorldImporters (thus preventing this convert).

USE AdventureWorks2014
DECLARE @AcctNum NVARCHAR(20) = N'AW00000146';

FROM Sales.Customer cus
WHERE cus.AccountNumber = @AcctNum;

--CREATE INDEX IX_Customer_AccountNumber ON dbo.Customer(AccountNumber);
--DROP INDEX IX_Customer_AccountNumber ON dbo.Customer;

Even though I’ve created an index on the AccountNumber column, this results in a scan.

Usually I’ve seen the CONVERT_IMPLICIT on the scan operator itself, but here we see it as part of the filter. The query only returns 1 row, but we read all 19820 rows on the table to fulfill this. Image the effect if this table had 10 million rows.

Why the Table Scan?

Reasonable question. SQL Server doesn’t really have a choice. The index is built on the AccountNumber, not the output of this function call:

CONVERT_IMPLICIT(nvarchar(10),[AdventureWorks2014].[Sales].[Customer].[AccountNumber] as [cus].[AccountNumber],0)

So, SQL Server can’t search the index for this information; it’s not in the index. And SQL Server can’t really predict what the output of the function will be before it has converted the data, so it has to look at every AccountNumber value, convert it, then compare. The most expensive part isn’t the convert itself, but the table scan it forces.

So, pay attention to your data types when writing procedures and queries to avoid these issues. The effect of an implicit conversion on a column is much more pronounced the larger the table is, and it can easily drive your CPU to 100%.

Hidden Implicit Conversions

One of my attempts to create an implicit conversion worked, but didn’t show it clearly in the plan. I tried the following example using the sql_variant type:

DECLARE @svar sql_variant = 90.1;

	sol.OrderLineID, sol.UnitPrice
FROM Sales.OrderLines sol
	sol.UnitPrice = @svar;

The plan is similar to the last one, but we don’t see the CONVERT_IMPLICIT when we mouseover any of the three operators.

However, if you hit F4 and examine the Compute Scalar operator, you can see the definition of expression Expr1001 in the Properties window.

This clearly shows the column being converted, But it’s odd to need to do this to see the convert. 

This would be a good time to refer to a bit of brilliance Jonathan Kehayias posted quite some time ago. This script will allow you to see any implicit conversions on columns in the plan cache, along with the statement, plan, and other relevant details. This can be useful in a number of cases, but especially when the operation is hidden as in this example.

Also, Any Other Function CALL

This post is primarily about understanding how implicit conversions work and why the occur, but it bears mentioning that basically any function call wrapping a column you want to search on will work the same way. 

For example, using SUBSTRING on AccountNumber gives largely the same result:

FROM Sales.Customer cus
	SUBSTRING(cus.AccountNumber,7,4) = '0146';

If you need to use a function call in your WHERE, hopefully you have another clause that can do the majority of your filtering.

Happy New Year

Hope you had a great holiday season! I’ll keep up the posting going forward, so check back in, and reach out if you have any questions I can help answer.

How Relevant is the TOP operator?

I’ve explained what a blocking operator is and provided a few examples, but maybe this doesn’t seem important. It’s affecting the TOP operator, sure, but don’t people just use this to look at the TOP 1000 rows of their tables in SSMS?

The TOP operator is useful for many operations, especially in a large environment. Large operation can timeout or fail for a variety of reasons, consuming resources without providing the results you need. A small, batch-sized operation is more likely to succeed and tends to perform more consistently. Many maintenance operations make sense to run with a TOP operator, so we should make sure those operations aren’t stymied by blocking operators. Some examples:

  • Garbage collection on a table with many millions rows. You want this to perform quickly, but you really can’t afford for this to time out (whatever that timeout period may be). We can limit how long this runs by GC’ing a small batch at a time, but this can be hampered badly by an extra sort or a hash join.
  • Archiving data applies for the same reasoning as GC. If you archive data to another table\database\server, you’ll want to keep your operation small enough to manage.
  • Backfilling a new column. If the existing table is large, you can’t just UPDATE the whole table; you’ll lock the table and block all your users. A batched UPDATE in a loop or in a scheduled process can resolve this without causing an outage.
  • GDPR is here, and CCPA is coming. You may need to anonymize data across many related tables. We need this to perform well to cleanse our existing data, and we’ll continue running this process going forward.
  • Queries producing potentially large results to your application may need to be batched as well. If this times out, you’re still wasting a lot of time and reads, even if no data is changed.

Out the Window

I examined one process recently that was similar to this query, causing a GC operation to time out.

	ROW_NUMBER() OVER(Partition By inv.CustomerID ORDER BY inv.InvoiceDate) AS SortID
FROM Sales.Invoices inv
JOIN Sales.InvoiceLines ili
	ON ili.InvoiceID = inv.InvoiceID
	inv.InvoiceDate < DATEADD(day, -60, GETUTCDATE()) 
	AND ili.LastEditedWhen < DATEADD(day, -60, GETUTCDATE()) 

It was a SELECT statement, and it inserted the important fields into a temp table, and ran DELETEs against multiple tables based on the contents of the temp table. But it was the initial SELECT that had a poor plan and caused the timeout.

I quickly saw a SORT in the execution plan and wondered why. The actual query didn’t have an ORDER BY clause. But it did have a ROW_NUMBER OVER in the select list; took me a minute to notice that.

But was the sorting necessary? “We need to delete really old records in this table, but it’s vitally important that we delete them in the order they were originally inserted!”

It seemed a poor reason to sort a table with tens of millions of rows. Coupled with the very small batch size, we were doing an extraordinary amount of work to get rid of a few rows. So what if we commented the ROW_NUMBER and the ORDER BY out?

Even though the new plan has a scan operator, we only read 110 rows from it because we are using the TOP operator properly. Note the row counts from the first plan again. We have 479 thousand rows going through multiple operators in the first, but only 100 per operation in the second.

Avoid one Blocking Operator, find another

Here’s an example from recent work. I was looking at a query in a GC that was populating a temp table to use for later DELETEs. I was anticipating the optimizer to try a hash match join, so I used an INNER LOOP JOIN hint to avoid that blocking operator. The results were quite unpleasant, as you can see in this anonymized plan.

So, I avoided the hash match join, but the SQL Server optimizer didn’t see it the way I did. The first table is a temp table, Object6, which we are joining to a normal table, Object11. But this plan includes a table spool that not only forces us to read all 587741 rows in the seek against our table, it seems to create a cross join in memory between the results of the clustered index scan on the temp table and the clustered index seek against the base table (538 * 587741 = 316204658).

That hint obviously wouldn’t work. I reversed the order of the tables, then removed the hint altogether, giving the following:


2000 rows returned across the board. There is a SORT operation, but it’s after the TOP so there’s no harm.. Our results are being sorted before inserting them into a second temp table. And a much more performant query, taking ~8 ms instead of 75000 ms.

Sort out blocking operators

Hopefully this has been informative. I honestly wasn’t aware of blocking operators until a month or two ago. That’s the frustrating and interesting thing of SQL Server sometimes; you can work on this as long as I have and, even putting new development aside, there’s always more to learn.

Hope this helps you optimize some of your own operations.

It seems so simple

The TOP operator seems pretty straightforward. “Hey SQL Server, give me the first 100 rows that match this criteria, then stop.” But when certain operations get involved it can go sideways.

Let’s start with a simple example in the WorldWideImporters database.

FROM Sales.OrderLines sol 
WHERE sol.OrderLineID < 1000 
	AND sol.UnitPrice > 50

The query here is simple, and we can see the TOP operator only returns 100 rows, but so does the index seek underneath. The way this works is that the TOP asks the operation connected to it for 1 row, and keeps asking until it has 100 or the operation below can’t find anymore rows that match.

Because of the WHERE clause I chose, we actually had to read 943 rows in the table to get 100 that matched. At that point the TOP stopped asking for more rows.

If the TOP operator kept asking for more rows and only displayed the first 100, that would have been a waste of effort. Still, we’re only querying 1 index here.

Let’s add a wrinkle

FROM Sales.OrderLines sol 
	sol.OrderID < 1000 
	AND sol.Quantity > 5
ORDER BY sol.OrderID

Now we have a key lookup, and we can follow the flow of what happened. We read and returned 270 rows from the index seek. There’s a nested loops operator connecting us to our key lookup, and that returned only 100 rows. So, we had to read 270 rows from the index seek to find 100 rows that met all our filters. 

Ultimately the TOP operator was asking for 1 row repeatedly until it met its quota, and then it stopped asking for more. As expected. 

What’s a blocking operator?

So, with another change we’ll see one of those operators I mentioned earlier.

FROM Sales.OrderLines sol 
	sol.OrderID < 5000 
	AND sol.Quantity > 5
ORDER BY sol.LastEditedWhen

Here, we’re seeking based on a range of OrderIDs, but getting results sorted in a different order. We’re reading many more records from the index and the key lookup (where the estimate is wayyy off). We don’t need to read this much to get our results, because we returned 2160 rows from the key lookup and the nested loops join, but then we reduce when we hit the sort.

It makes sense that you can’t return the top 100 rows that meet this criteria, until you’ve seen them all. Well, unless the data is in the order you want, and you can just seek it that way from the index. Our previous query had an ORDER BY clause but no sort operation because our sort matched our range seek.

Sort is a blocking operator. Don’t feel bad if you haven’t heard of the term; I’ve been working with SQL Server for 15 years, and I’m sure I never heard the term until the incomparable Grant Fritchey mentioned it while he was lecturing at my place of employment.

So sorts and several other types of operators (eager spools, remote query\scan\etc, hash match joins, and more) will block the normal flow and gather all their results before passing any rows on. The hash match join only blocks while building its hash table from the first input, before probing the second.

Let’s hash it out

FROM Sales.OrderLines sol 
JOIN Sales.Orders so
	ON so.OrderID = sol.OrderID
	sol.OrderID < 5000 
	AND sol.Quantity > 5
--ORDER BY sol.LastEditedWhen

We filtered on OrderID < 5000, and we read 4999 rows from the build table. So we read everything that fit that criteria; we didn’t stop early because we had our 100 rows. So, definitely blocking behavior.

Then we probe the second table and return 900 rows, and they pass through the hash match. The results get reduced to 100 by the TOP operator. 

Why 900 rows from the OrderLines table, not 100? That’s less than clear. As I vary the result set or the TOP size, I get a number of behaviors using different indexes and returning batches of various sizes. It appears the probe may be trying to do a batch of rows at a time, or it may be related to the memory allocated. (If I can get a clearer answer to this aspect, I’ll update the post).

Update: I consulted with my colleague, the superb Kevin Feasel, who suggested this was operating in batch mode. I was testing this using SQL Server 2019, and batch mode on rowstore is a new feature that everyone should be aware of.

I thought I had ruled this out however, and indeed the probe operation was in row mode:

However, the hash match operation was not!

So the hash match requested a batch of 900 rows, which is why we saw the unexpected number of rows.

If you try this in SQL Server 2019, you may see different behaviors as you vary the result set (when I removed LastEditedWhen from the result set, it changed to row mode and only returned 100 rows) or the TOP size (TOP 1000 dropped it back to row mode). I also saw some variation with the index used against OrderLines, including the columnstore index.

Summing up

These have been relatively simple examples of the TOP operator in action, and how it interacts with other operators. In my next post, I’ll provide some more complicated plans, and discuss how we can keep our TOPs in top shape.