Slow Garbage Collection

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it. We’re scanning and returning 1.4 million rows from the first table, doing a key lookup on all of them. We scan another table to look up the retention period for the related account, as this database has information from multiple accounts, and return 8.4 million rows there. We join two massive record sets, then have a filter operator that gets us down to a more reasonable size.

In general, when running a complex query you want your most effective filter, that results in the fewest rows, to happen first. We want to get the result set small early, because every operation after becomes less expensive. Here’s part of our anonymized query:

So, Object5 is our main table. Column10 is a status column, and we compare that to a scalar value to make sure this record is ready for GC. That’s fine. The second is checking if the record is old enough to GC, and this is where things get interesting. The code here is odd; we’re subtracting a COALESCE of three values from a DATETIME variable, relying on behavior that subtracting an INT from a DATETIME subtracts that number of days from the DATETIME. There’s an implicit conversion, but it is on the variable side of the inequality. And the plan above appears to be our result.

So, Bad Date Math Code?

Seems like a good root cause, and I don’t like this lazy coding pattern, but let’s test it out with tables we can all look at. Would this be any better if we had written this to use the DATEADD function to do the math properly?

So, if we had an index on the OrderDate column, would it perform differently?

Apparently not. Same plan, same cost. But when you think about it, this tests the date math by subtracting an integer from the DATETIME provided by GETUTCDATE(). The original was subtracting a COALESCE of three values. One of those values was a float. Could the COALESCE or the resulting data type have made this more complicated?

Testing COALESCE

Running this, again we see a nice seek that reads and returns 100 rows. So the different data type and the COALESCE makes no difference.

Looking at the original query again, the first value isn’t a variable, it’s a column from a different table. We can’t filter by this column until we’ve read the other table, which affects our join order. But we have no criteria to seek the second table with.

Joined Filtering

One more test. Let’s see what the behavior looks like if we join to the Customers table to look for the RetentionPeriod. First, I’ll create and populate some data in that column:

I only populated a few of the records to better match the production issue; only some accounts have the value I’m looking for, hence the COALESCE.

Now we’re trying to filter records in SalesOrderHeader, based on the Customer’s RetentionPeriod. How does this perform?

 

Well, the row counts aren’t terrible, but we are scanning both tables. The optimizer opted to start with Customer table, which is much smaller. We’re not filtering on the date until the filter operator, after the merge join.

I’d be worried that with a larger batch size or tables that don’t line up for a merge join, we’d just end up doing a hash match. That would force us to scan the first table, and without any filter criteria that would be a lot of reads.

Solution

The solution I applied to my production query was to create a temp table that had the account ID and the related retention period. I joined this to my primary table, and the query that was taking 84 seconds was replaced with one that took around 20 milliseconds.

 

Here’s how I’d apply that thought to our example. I’ve created a temp table with CustomerID and its associated RetentionDate, so we could use that to search Sales.SalesOrderHeader.

You may have noticed my filter on the Sales.Customer table. In the live issue, the temp table had dozens of rows for dozens of accounts, not the tens of thousands I’d get from using all rows Sales.Customer in my example. So, I filtered it to get a similar size in my temp table.

Infinitely better. Literally in this case, since the ElapsedTime and ActualElapsedms indicators in the plan XML are all 0 milliseconds. I’m declaring victory.

If you liked this post, please follow me on twitter or contact me if you have questions.

Reviewing the concept

A quick search will tell you that implicit conversions are pretty awful for performance, and in particular drive CPU usage. That’s not news. There is an aspect of this I think a lot of engineers don’t understand; why does it cause performance issues?

An implicit conversion occurs when you try to compare between or assign data across two different data types, without converting one yourself. If you used a CAST or CONVERT, it would be an explicit conversion. When SQL Server does it for you, it’s an implicit conversion, and these can have a real impact on your execution plans. Not all combinations of types can be converted implicitly, for a full list look here.

So let’s look at an example based on WideWorldImporters:

The Description column is an nvarchar type, but I’m comparing it to a string that is non-unicode. And for the sake of this example, I did create an index on this column.

And we see an implicit conversion. In this case, the optimizer converted our string to a nvarchar(4000) type to match the Description column. But in this case, the implicit conversion isn’t going to cause a problem. Converting one value isn’t that expensive; you can confirm that from the ~0.025 cost for this operation. We even used the index on the column.

The real problem comes when SQL Server converts the column to match the variable.

Reproducing the issue

I’ve seen implicit conversions many times, and have one example in my head I recall very clearly. So I was fairly astonished when I tried to reproduce this using WideWorldImporters. SQL Server tries very hard to convert the parameter or literal, rather than the column. It’s quite difficult to get it behave badly, in part because of the data types chosen for the tables. 

Microsoft’s documentation on data type precedence indicates that SQL Server will convert from one type to another based on the priority in the chart; but I have an example that seems to counter this.

The Quantity column is a int, which has a lower priority than our decimal type variable. So according to the chart, the column should be converted, meaning every value on every row; more on that later.

But looking at the plan, we don’t convert the Quantity column. Instead there is a scalar operator on the variable that appears to be obscuring the underlying conversion:

So, the behavior here doesn’t match expectations. In my testing, all the int\decimal\money types worked similarly, and the optimizer would convert the variable. One takeaway I had from writing this is that implicit conversion occur much more often than I thought, but SQL Server is smart enough to convert the variable unless it has no choice. You likely wouldn’t see the many plans that convert the column, because they aren’t causing a performance problem. Instead, you’ll see and remember the one that caused a massive issue.

another angle

An easy example would be to compare a varchar column to an nvarchar variable. This forces SQL Server’s hand, only in one direction. SQL can easily convert any varchar to an nvarchar, but the opposite is more problematic. If your nvarchar string contains kanji or something else that can’t be represented with a varchar string, the operation would fail. So, SQL Server would rather convert the varchar, even if that means converting a column.

Note, I had to change databases here, because there are no varchar columns in WideWorldImporters (thus preventing this convert).

Even though I’ve created an index on the AccountNumber column, this results in a scan.

Usually I’ve seen the CONVERT_IMPLICIT on the scan operator itself, but here we see it as part of the filter. The query only returns 1 row, but we read all 19820 rows on the table to fulfill this. Image the effect if this table had 10 million rows.

Why the Table Scan?

Reasonable question. SQL Server doesn’t really have a choice. The index is built on the AccountNumber, not the output of this function call:

So, SQL Server can’t search the index for this information; it’s not in the index. And SQL Server can’t really predict what the output of the function will be before it has converted the data, so it has to look at every AccountNumber value, convert it, then compare. The most expensive part isn’t the convert itself, but the table scan it forces.

So, pay attention to your data types when writing procedures and queries to avoid these issues. The effect of an implicit conversion on a column is much more pronounced the larger the table is, and it can easily drive your CPU to 100%.

Hidden Implicit Conversions

One of my attempts to create an implicit conversion worked, but didn’t show it clearly in the plan. I tried the following example using the sql_variant type:

The plan is similar to the last one, but we don’t see the CONVERT_IMPLICIT when we mouseover any of the three operators.

However, if you hit F4 and examine the Compute Scalar operator, you can see the definition of expression Expr1001 in the Properties window.

This clearly shows the column being converted, But it’s odd to need to do this to see the convert. 

This would be a good time to refer to a bit of brilliance Jonathan Kehayias posted quite some time ago. This script will allow you to see any implicit conversions on columns in the plan cache, along with the statement, plan, and other relevant details. This can be useful in a number of cases, but especially when the operation is hidden as in this example.

Also, Any Other Function CALL

This post is primarily about understanding how implicit conversions work and why the occur, but it bears mentioning that basically any function call wrapping a column you want to search on will work the same way. 

For example, using SUBSTRING on AccountNumber gives largely the same result:

If you need to use a function call in your WHERE, hopefully you have another clause that can do the majority of your filtering.

Happy New Year

Hope you had a great holiday season! I’ll keep up the posting going forward, so check back in, and reach out if you have any questions I can help answer.