QDS Toolbox: Waits Variation

Posted on 02/07/202202/07/2022 by sqljared

In my most recent blog post, I looked at the Query Variation report, which compares the recent performance of queries versus their historical performance to either highlight improvements or regressions in performance. The Waits Variation component does the same, but comparing the recent waits for a query to its historical waits.

One thing to keep in mind, is that if a given query is changed in any way (to change the filter, return additional columns, or include a hint), the changed query will have a different query_id in Query Store. In which case, both the Waits Variation and Query Variation procedures will not compare the historical performance of the old query to the recent performance of the new one.

That being said, let’s look at the dbo.WaitsVariation procedure’s options.

Waits Variation Options

@ServerIdentifier: Defaults to the current instance. Set this to gather data from another instance of SQL Server.
@DatabaseName: Defaults to the current database. I’m querying data from WideWorldImporters for my examples.
@ReportIndex: Default NULL. When used, stores information about the parameters used for the report.
@ReportTable: Default NULL. Allows you to store the report data in a table like dbo.WaitsVariationStore, created by the installer.
@WaitType: Which wait category are we concerned with? Total by default. [Total, CPU, Lock, Latch, BufferLatch, BufferIO, WorkerThread, NetworkIO, Parallelism…] There are 36 options here, so check the header of dbo.WaitsVariation for the full list.
@Metric: Do we want to compare the total wait time for the query, or the average? Avg by default. [Avg, Total]
@VariationType: Are we looking for queries that have improved [I] or regressed [R]. ‘R’ by default.
@ResultsRowCount: Number of rows to include in the report. Only the largest regressions or improvements are included. Default is 25, and if this value is NULL or negative, the procedure uses the default.
@RecentStartTime and @RecentEndTime: Defines what the ‘recent’ period is, which will be compared against the historical. The defaults are 1 hour ago for @RecentStartTime, and now for @RecentEndTime.
@HistoryStartTime and @HistoryEndTime: Defines what the ‘history’ period is, which will be compared against the recent. The defaults are 30 days ago for @HistoryStartTime, and 1 hour ago for @HistoryEndTime.
@IncludeQueryText: Includes the text for any identified queries in the output. Default is 0.
@ExcludeAdhoc: Ignores ad hoc queries, anything that isn’t part of a procedure or other defined object. Default is 0.
@ExcludeInternal: Excludes internal queries\operations run by SQL Server itself. Default is 0.
@VerboseMode: Default 0. Provides the queries being used in the messages tab.
@TestMode: Default 0. When enabled, does everything except actually run the generated queries to create the report.
OUTPUT @ReportID: When you @ReportIndex and @ReportTable, a saved report is generated. You can then use the @ReportID to see that report separately.

So, these parameters are very similar to dbo.QueryVariation. You’ll likely want to alter your time parameters, and you could generate multiple reports to check different wait types. Or you could do a regular report for total waits, and review to see which type of wait had the biggest change.

Best Total Wait Improvement

Given that the default is to look for regressions instead of improvements, if I wanted to compare the last two day’s waits to the previous month, I could use the following:

DECLARE @RecentStartTime	DATETIME2 = DATEADD(HOUR, -48, GETUTCDATE());
DECLARE @RecentEndTime	DATETIME2 = GETUTCDATE();
DECLARE @HistoryStartTime	DATETIME2 = DATEADD(DAY, -30, @RecentEndTime);
DECLARE @HistoryEndTime	DATETIME2 = @RecentStartTime;

	EXECUTE [dbo].[WaitsVariation]
		@DatabaseName	= 'WideWorldImporters',
		@WaitType	= 'Total',
		@Metric		= 'Avg',
		@VariationType = 'I',
		@ExcludeAdhoc = 1,
		@HistoryStartTime = @HistoryStartTime,
		@HistoryEndTime	= @HistoryEndTime,
		@RecentStartTime = @RecentStartTime,
		@RecentEndTime = @RecentEndTime;

Choosing ‘Total’ as your @Metric may work for some cases, but would tend to give results with large variation. If a query executed more times in one period than the other, I would expect that period to have more waits.

I ran a workload to generate some activity, and only one query in it regressed, so I’m returning improvements so there’s a little more to look at.

So, the four procs listed had some rather dramatic improvements. We can see the amount of the total wait reduction, and columns for unknown and CPU waits. This is only a portion of the incredibly wide result set. There are history, recent, and variation % columns for each wait type.

We ran this script looking for ‘Total’ as our @WaitType, which means we are sorting and returning the top rows with the most improvement in that area. But it did record the improvement for all wait types. It only takes a little scrolling to find our causes:

So, the first procedure had relatively large parallelism waits in its history, compared with none in the recent period. I would argue the variation % should be 100% not 0%, but the logic here may be special to prevent us from diving by 0.

All four procedures had NetworkIO waits. It’s likely the tool I was using to run these queries in a loop was not consuming the results fast enough, causing delays within SQL Server. These queries were taking so long that I used a smaller date range on my more recent calls. Fewer records, less time for the app to consume them, and smaller NetworkIO waits.

One thing I have noticed is that if you use @IncludeQueryText, the text returned is compressed and unreadable. Unless you create a saved report by using the @ReportIndex and @ReportTable options, which leads us to the second procedure.

Waits Variation from Store

If I save the results, there are records in the tables indicated, I can run dbo.WaitsVariationFromStore to read out the header of the report and its details in two result sets.

The result set is much more limited, but focuses on the metric we chose. We can see the number of executions, the related object, the change in our totals waits, and the decompressed text for the query itself.

There are only three parameters for this proc, but here’s an example:

USE QDSToolBox
GO

	EXECUTE dbo.WaitsVariationFromStore
		@ReportID = 1,
		@VerboseMode = 0,
		@TestMode = 0;

Nearly done

There are two more components to the QDS Dashboard I want to detail, but I may be spreading them out between some other topics I want to blog on.

Time to go update slides for a user group meeting later this week.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

QDS Toolbox: Query Variation

Posted on 12/14/202112/14/2021 by sqljared

Since Query Store stores statistics long term, we can use it to look for trends or major changes in performance. The Query Variation component of the QDS Toolbox helps us review changes and has a lot of options to allow us to select the kinds of changes we are interested in.

So, let’s review our options and go through some examples.

Options

Let’s discuss the options and how they interact.

@ServerIdentifier: Defaults to the current instance. Set this to gather data from another instance of SQL Server.
@DatabaseName: Defaults to the current database. I’m querying data from WideWorldImporters for my examples.
@ReportIndex: Default NULL. When used, stores information about the parameters used for the report.
@ReportTable: Default NULL. Allows you to store the report data in a table like dbo.QueryVariationStore, created by the installer.
@Measurement: What statistic are we basing our report on? CPU by default. [CLR, CPU, DOP, Duration, Log, LogicalIOReads, LogicalIOWrites, MaxMemory, PhysicalIOReads, Rowcount, TempDB]
@Metric: How are we comparing our measurements? Avg by default. [Avg, Max, Min, StdDev, Total]
@VariationType: Are we looking for queries that have improved [I] or regressed [R]. ‘R’ by default.
@ResultsRowCount: Number of rows to include in the report. Default is 25.
@RecentStartTime and @RecentEndTime: Defines what the ‘recent’ period is, which will be compared against the historical. The defaults are 1 hour ago for @RecentStartTime, and now for @RecentEndTime.
@HistoryStartTime and @HistoryEndTime: Defines what the ‘history’ period is, which will be compared against the recent. The defaults are 30 days ago for @HistoryStartTime, and 1 hour ago for @HistoryEndTime.
@MinExecCount: Only considers queries with at least this many recent executions. Default is 1.
@MinPlanCount and @MaxPlanCount: Only includes queries with a number of plans in this range. Defaults are 1 and 99999.
@IncludeQueryText: Includes the text for any identified queries in the output. Default is 0.
@ExcludeAdhoc: Ignores ad hoc queries, anything that isn’t part of a procedure or other defined object. Default is 0.
@ExcludeInternal: Excludes internal queries\operations run by SQL Server itself. Default is 0.
@VerboseMode: Default 0. Provides the queries being used in the messages tab.
@TestMode: Default 0. When enabled, does everything except actually run the generated queries to create the report.

So let’s look at a few examples Pablo Lozano put in the header for the procedure.

Worst Regressions

The first example using the Query Variation proc creates a list of the 25 queries with the largest regression in performance for the last hour, compared to the history over the last 30 days.

	EXECUTE [dbo].[QueryVariation]
		@DatabaseName	= 'WideWorldImporters',
		@Measurement	= 'Duration',
		@Metric			= 'Avg';

We’re relying a lot on default values here, but it shows how simple it can be to run this procedure and get usable results.

Our results include queries that are in procedures and those that aren’t. We have the QueryID values for each if we want to run another query to look them up, but we didn’t include the option to get the text for the statements.

Since the default is to look for regressed queries, our list has 25 queries in descending order based on how large the regression was. We can see the average duration recently and historically, the number of executions is each window, and the overall number of plans.

The first two queries on this list from Integration.GetCustomerUpdates both have regressed by almost 300%. Both also have two execution plans. That could be the explanation for their regression; we could have a new plan that is worse for a variety of reasons. We also are comparing 8 recent executions to 100 in our history, so it could just be the small sample size (or cold cache) is making a big difference.

We have the information we need here to find the query and follow up from there. I’m fond of using the “Tracked Queries” report in SSMS under Query Store, and that would show us the query and it’s recent plans.

Biggest CPU Improvements

Here’s the second example from the procedure’s comments.

		EXECUTE [dbo].[QueryVariation]
			@DatabaseName	= 'WideWorldImporters',
			--@ReportIndex		= '[dbo].[QueryVariationIndex]',
			--@ReportTable		= '[dbo].[QueryVariationReport]',
			@Measurement		= 'CPU',
			@Metric				= 'Total',
			@VariationType		= 'I',
			@ResultsRowCount	= 10,
			@MaxPlanCount		= 1,
			@IncludeQueryText	= 1;

I’ve added my database name and left the report tables in comments, and I did change the option to include query text.

This report should show us queries that have improved the most, based on total CPU usage. It will only give us 10 results, and only includes query with a single plan.

Results here are showing us procedure with the biggest reduction in overall CPU usage, and several are around 90%. But, that’s not surprising since we measured off the total numbers instead of average, and we’ve run the queries far less in the recent window. If we run the query 92% less in the recent window, this will show up as a reduction.

If our executions were more even, it could draw attention to statements that have improved despite the plan not changing, since we set our @MaxPlanCount to 1.

The QuerySQLText has information on variables at the beginning, but we see which statement is being referred to by each line of the report.

Query Variation Usage

The Query Variation report is good for a general comparison of performance. Run this daily, check your average duration\CPU\logical IO reads against the historical to look for regressions, and you get a good list of queries and procs you may want to look into.

Do keep in mind that you can’t use this check the results of a change to a query. If you are changing the text of a statement in any way, the modified statement is a different query_id in Query Store. So this report wouldn’t compare it to the original statement.

This can identify queries that have a change in performance because of a new plan, depending on how you set the @MaxPlanCount. So if you are changing an index to improve a query, a query variation report could show the statement’s improvement (or regression).

Speaking at SQL Friday this week

If you are not aware of SQL Friday, there’s a talk done online live every week at noon CET (GMT +6). You can also view previous sessions through links here.

I’ll be speaking this week with a case study on optimizing a very frequently run procedure. It is the same subject I presented at Virtual PASS 2021.

I have a few more blogs coming on the QDS Dashboard, and a few other topics I’m likely to write on soon. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

QDS TOOLBOX: Pivoted Waits Stats

Posted on 11/03/202111/03/2021 by sqljared

In contrast to the previous QDS Toolbox components, this one is relatively simple. Instead of a tool that’s ready to be scheduled to give you a report, this just adds the dbo.query_store_wait_stats_pivoted view.

So, what’s the value here?

The typical case

The way I use Query Store most often, is to investigate the performance of a specific procedure or statement. I tend to do this when I know there is an issue, or when I want to evaluate the improvement of a new change.

Here’s a query I’ll often run (or some variation thereof):

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	qsq.query_hash
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(hour, -2, GETUTCDATE())
	--AND qt.query_sql_text LIKE '%something%'
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Now, Query Store’s performance statistics are separated into intervals. We can set the size for an interval when we turn on Query Store (60 minutes by default, I have it at 15 on my test environment). So, this query will show me the performance of each statement for each interval over the last two hours.

Sometimes I’ll aggregate the data so I can look at the averages over several days, but sometimes I like seeing each interval to see if there is a lot of variation.

So, here’s that result set:

sys.query_store_wait_stats

But let’s say we look at this data and are concerned one of these statements is taking longer than expected, and we want to see if it is waiting on anything. So, we take that query and add a join to sys.query_store_wait_stats. Seems simple enough.

--Statement numbers for the last two hours, no aggregation, with wait stats for one statement
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	ws.wait_category_desc,
	ws.execution_type_desc,
	ws.total_query_wait_time_ms,
	ws.avg_query_wait_time_ms
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN sys.query_store_wait_stats ws
	ON ws.plan_id = qrs.plan_id
	AND ws.runtime_stats_interval_id = qsi.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(day, -2, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time,
	ws.wait_category_desc

And let’s see our results:

Now we have 9 rows instead of 5. And the number under avg_duration look redundant.

What’s happened, is that sys.query_store_wait_stats has multiplied our result set. Since it stores a record for each wait type observed in each interval, we have multiple combined rows for several of these intervals. And one of our rows would have been removed from this set if we didn’t LEFT JOIN to sys.query_store_wait_stats, since it had no waits.

This is probably not how you would image it to work on first glance. And you might want to pivot those results instead, but the Pivoted Wait Stats view has done that for you.

Using Pivoted Wait Stats

If I take my original query and join to dbo.query_store_wait_stats_pivoted, I will get one row of output for each plan_id and runtime_stats_interval_id.

And the result set will be rather wide. There’s a total and average column for each wait type in sys.query_store_wait_stats. You may or may not want to include all in your query.

So, here’s an example with the average for several interesting wait types:

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	wsp.Average_BufferIO,
	wsp.Average_BufferLatch,
	wsp.Average_CPU,
	wsp.Average_Lock,
	wsp.Average_Memory,
	wsp.Average_NetworkIO,
	wsp.Average_Parallelism,
	wsp.Average_Tracing
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN dbo.query_store_wait_stats_pivoted wsp 
	ON wsp.plan_id = qp.plan_id
	AND wsp.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetSaleUpdates')
	AND qsi.end_time > DATEADD(day, -5, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Runtime Stats with Pivoted Wait Stats

We have one record for this query for each of three intervals. In each row, I can see the average wait time for CPU, disk (BufferIO), buffer latch, and other waits. In this case, the largest waits were with NetworkIO. Likely, SSMS was having trouble keeping up with displaying multiple result sets, because I ran this batch 50 times in a row.

One last thing, keep in mind the times from the main tables in Query Store like sys.query_store_runtime_stats are reported in microseconds, while the wait tables are in milliseconds whether you use sys.query_store_wait_stats or dbo.query_store_wait_stats_pivoted. A query that takes 7,000,000 microseconds to run but waited for CPU time for 6,900 milliseconds spent nearly its entire duration waiting.

Don’t forget about PASS Data Community Summit 2021!

PASS Data Community Summit 2021 is next week, and the event is virtual and free, so please go and register if you haven’t already. I’m happy to see PASS and the summit are Red Gate’s hands, and expect it to be a great event.

Many of the sessions, including my session, are prerecorded and can be watched at any time. This should make it easier to get around conflicts between sessions you want to hear. There are also Q&A sessions for the prerecorded sessions, and those are scheduled and live. So get registered and make sure you don’t miss out.