Over the last few months I've been lecturing at classes and conferences about getting SQL Server's view of the I/O subsystem and what latencies it is experiencing, so time for a blog post to help everyone else.
Most SQL Server's today are I/O bound - that's a generally agreed-on statement by DBAs and consultants in the field. This means that the major factor in server performance is its ability to perform I/O operations quickly. If the I/O subsystem cannot keep up with the demand being placed on it, then the SQL Server workload will suffer performance problems.
Now, saying that, one trap that many people fall into is equating increased I/O subsystem latencies with poor I/O subsystem performance. This is often not the case at all. It's usually the case that the I/O subsystem performs fine when the designed-for I/O workload is happening, but becomes the performance bottleneck when the I/O workload increases past the I/O subsystem design point. The I/O workload increase is what's causing the problem, not the I/O subsystem - if you design an I/O subsystem to support 1000 IOPS (I/O operations per second) and SQL Server is trying to push 2000 IOPS, performance is going to suffer.
If you find that I/O latency has increased, look to a change in SQL Server behavior before blaming the I/O subsystem. For instance:
-
Query plan changes from out-of-date statistics, code changes, implicit conversions, poor indexing that cause table scans rather than index seeks
-
Additional indexes being created that cause increased index maintenance workload and logging
-
Access pattern/key changes that cause page splits and hence extra page reads/writes and logging
-
Adding change data capture or change tracking that causes extra writes and logging
-
Enabling snapshot isolation that causes tempdb I/Os, plus potentially page splits
-
Decreased server memory leading to a smaller buffer pool and increased lazy writer and read activity
and a whole host of other reasons can lead to an increased I/O workload where it's not the I/O subsystem's fault.
On the other hand, however, it may very well be the I/O subsystem that has an issue if the SQL Server workload is the same. A SAN administrator may have decided to give some of the space on one of the SQL Server LUNs to another server, which can lead to an overload.
Using performance counters, you can see at the physical disk level what the latencies are (looking at the Avg. Disk sec/Read and Avg. Disk sec/Write counters) but if you have your databases spread across a few LUNs, that doesn't help you pinpointing which database files are experiencing latency issues and driving the most I/Os to the I/O subsystem.
This is where sys.dm_io_virtual_file_stats comes in. It was introduced in SQL Server 2005 as a beefed-up replacement for fn_virtualfilestats and shows you how many I/Os have occured, with latencies for all files.
You can give it a database ID and a file ID, but I found it most useful to look at all the files on the server and order by one of the statistics.
Here's the default output (with some column name changed slightly to make things fit nicely):
SELECT * FROM sys.dm_io_virtual_file_stats (NULL, NULL);
GO
database_id file_id sample_ms reads bytes_read io_stall_read_ms writes bytes_written io_stall_write_ms io_stall size_on_disk_bytes file_handle
----------- ------- --------- ----- ---------- ---------------- ------ ------------- ----------------- -------- ------------------ ------------------
1 1 14433009 52 3227648 58 10 90112 12 70 515964928 0x000000000000013C
1 2 14433009 55 1241088 36 28 122880 115 151 12648448 0x0000000000000474
2 1 14433009 22 1261568 22 4 147456 1 23 8388608 0x000000000000065C
2 2 14433009 6 385024 2 7 154112 3 5 524288 0x0000000000000660
3 1 14433009 31 1859584 18 1 8192 0 18 1310720 0x0000000000000524
3 2 14433009 6 385024 5 3 16384 1 6 524288 0x000000000000056C
4 1 14433009 139 8175616 369 13 106496 45 414 260767744 0x0000000000000958
4 2 14433009 136 712704 485 7 53248 25 510 104595456 0x00000000000009F0
5 1 14433009 20 1138688 204 0 0 0 204 419430400 0x0000000000000980
5 2 14433009 13 411136 466 1 2048 13 479 104857600 0x0000000000000A44
6 1 14433009 15 811008 81 1 8192 8 89 1310720 0x0000000000000954
6 2 14433009 7 386560 89 4 11264 80 169 516096 0x00000000000009EC
7 1 14433009 15 811008 80 1 8192 9 89 1310720 0x0000000000000970
7 2 14433009 7 386560 130 3 9216 17 147 516096 0x00000000000009B4
8 1 14433009 16 876544 37 1 8192 103 140 136577024 0x0000000000000988
8 2 14433009 9 229888 11 4 10752 59 70 5242880 0x00000000000009E4
.
.
This isn't that useful because a) I don't have database IDs and file paths memorized, and b) it gives aggregate latencies (io_stall_read_ms and io_stall_write_ms).
What I usually do is use the script below - part of my standard set of scripts I use when doign a server health audit for a client. It's based in part on code from by good friend Jimmy May (blog|twitter), with a bunch of tweaks. It allows me to filter on read or write latencies and it joins with sys.master_files to get database names and file paths.
SELECT
--virtual file latency
ReadLatency = CASE WHEN num_of_reads = 0
THEN 0 ELSE (io_stall_read_ms / num_of_reads) END,
WriteLatency = CASE WHEN num_of_writes = 0
THEN 0 ELSE (io_stall_write_ms / num_of_writes) END,
Latency = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)
THEN 0 ELSE (io_stall / (num_of_reads + num_of_writes)) END,
--avg bytes per IOP
AvgBPerRead = CASE WHEN num_of_reads = 0
THEN 0 ELSE (num_of_bytes_read / num_of_reads) END,
AvgBPerWrite = CASE WHEN io_stall_write_ms = 0
THEN 0 ELSE (num_of_bytes_written / num_of_writes) END,
AvgBPerTransfer = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)
THEN 0 ELSE ((num_of_bytes_read + num_of_bytes_written) /
(num_of_reads + num_of_writes)) END,
LEFT (mf.physical_name, 2) AS Drive,
DB_NAME (vfs.database_id) AS DB,
--vfs.*,
mf.physical_name
FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfs
JOIN sys.master_files AS mf
ON vfs.database_id = mf.database_id
AND vfs.file_id = mf.file_id
--WHERE vfs.file_id = 2 -- log files
-- ORDER BY Latency DESC
-- ORDER BY ReadLatency DESC
ORDER BY WriteLatency DESC;
GO
The (partial) output is below:
ReadLatency WriteLatency Latency AvgBPerRead AvgBPerWrite AvgBPerTransfer Drive DB physical_name
----------- ------------ ------- ----------- ------------ --------------- ----- ----------------------- ------------------------------------------
2 103 8 54784 8192 52043 D: HotSpot D:\SQLskills\HotSpot_data.mdf
7 48 13 63061 5632 54857 C: DemoCorruptSystemTables C:\SQLskills\corrupt_log.LDF
32 41 33 43861 2048 39680 D: DemoSystemIndex3 D:\SQLskills\DemoSystemIndex3_LOG.ldf
27 40 28 43861 1024 39577 D: DemoSystemIndex5 D:\SQLskills\DemoSystemIndex5_LOG.ldf
23 39 24 43861 7168 40192 D: DemoSystemIndex4 D:\SQLskills\DemoSystemIndex4_LOG.ldf
7 38 10 43861 4608 39936 D: DemoSystemIndex1 D:\SQLskills\DemoSystemIndex1_LOG.ldf
11 25 12 43861 7168 40192 C: FSWaits C:\SQLskills\FSWaits_log.ldf
12 20 13 55222 6656 49152 C: newdatabase C:\SQLskills\newdatabase_log.LDF
12 20 15 55222 2816 36165 C: LockResDemo C:\SQLskills\LockResDemo_log.LDF
14 17 15 45933 2816 30254 C: TempdbTest C:\SQLskills\TempdbTest_log.LDF
1 14 5 25543 2688 18510 C: HotSpot C:\SQLskills\HotSpot_log.ldf
40 14 39 31625 7680 29915 D: RecompileTest D:\SQLskills\RecompileTest_log.ldf
9 13 10 51858 7680 46336 D: DemoFatalCorruption1 D:\SQLskills\DemoFatalCorruption1_log.ldf
35 13 34 31625 2048 29513 D: GUIDTest D:\SQLskills\GUIDTest_log.ldf
12 13 12 16861 1024 15872 C: FNDBLogTest C:\SQLskills\FNDBLogTest_log.LDF
.
.
This is much more useful as it allows me to quickly see where the read and write hot spots are and then drill into a database to see what's going on, and if nothing out of the ordinary, ask the SAN admin to move those hot spot files to dedicated and/or faster storage.
One question I'm sure you're going to ask is "what is an acceptable read or write latency?" and the answer is a big "it depends!". It depends on the I/O subsystem and how it's been configured. The key is producing a performance baseline for when things are running acceptably well and then seeing where the results from the DMV deviate from your baseline. On well-configured storage which isn't being overloaded I'd expect to see single-digit ms for either read or write latency, but this will vary based on the rotation speed and technology of the drives (SCSI vs SATA vs SSD etc).
There are a bunch of other ways this DMV can be used in conjunction with other DMVs and I'll be blogging about these over the next few months.
Enjoy!