I/O wait is a per-CPU performance metric showing time spent idle, when there are threads on the CPU dispatcher queue (in sleep state) that are blocked on disk I/O. This divides CPU idle time into time spent with nothing to do, and time spent blocked on disk I/O. A high rate of I/O wait per CPU shows that the disks may be a bottleneck, leaving the CPU idle while it waits on them.I/O wait can be a very confusing metric. If another CPU-hungry process comes along, the I/O wait value can drop: the CPUs now have something to do, instead of being idle. However, the same disk I/O is still present and blocking threads, despite the drop in the I/O wait metric. The reverse has sometimes happened when system administrators have upgraded application software and the newer version is more efficient and uses fewer CPU cycles, revealing I/O wait. This can make the system administrator think that the upgrade has caused a disk issue and made performance worse, when in fact disk performance is the same, and CPU performance is improved.
There are also some subtle issues with how I/O wait was being calculated on Solaris. For the Solaris 10 release, the I/O wait metric was deprecated and hardwired to zero for tools that still needed to display it (for compatibility).A more reliable metric may be the time that application threads are blocked on disk I/O. This captures the pain endured by application threads caused by disk I/O, regardless of what other work the CPUs may be doing. This metric can be measured using static or dynamic tracing.
I/O wait is still a popular metric on Linux systems, and despite its confusing nature, it is used successfully to identify a type of disk bottleneck: disks busy, CPUs idle. One way to interpret it is to treat any wait I/O as a sign of a system bottleneck, and then tune the system to minimize it—even if the I/O is still occurring concurrently with CPU utilization. Concurrent I/O is more likely to be non-blocking I/O, and less likely to cause a direct issue. Nonconcurrent I/O, as identified by I/O wait, is more likely to be application blocking I/O, and a bottleneck.
摘录自《Systems Performance: Enterprise and the Cloud》
转载于:https://www.cnblogs.com/wuhuiyuan/p/4769082.html