Wednesday, June 19, 2013

Using esxtop to identify VMware storage performance issues

Configuring monitoring using esxtop
To monitor storage performance per HBA:
  1. Start esxtop by typing esxtop at the command line.
  2. Press d to switch to disk view (HBA mode).
  3. Press f to modify the fields that are displayed.
  4. To view the entire Device name, press SHIFT + L and enter 36 in Change the name field size.
  5. Press b, c, d, e, h, and j to toggle the fields and press Enter.
  6. Press s, then 2 to alter the update time to every 2 seconds and press Enter.
  7. See Analyzing esxtop columns for a description of relevant columns.
Note: The following options are only available in VMware ESX 3.5 and later.
To monitor storage performance on a per-LUN basis:
  1. Start esxtop by typing esxtop from the command line.
  2. Press u to switch to disk view (LUN mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, c, f, and h to toggle the fields and press Enter.
  5. Press s, then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns.

To increase the width of the device field in esxtop to show the complete naa id:

  1. Start esxtop by typing esxtop at the command line.
  2. Press u to switch to the disk device display.
  3. Press L to change the name field size.
    Note: Ensure to use uppercase L.
  4. Enter the value 36 to display the complete naa identifier.

To monitor storage performance on a per-virtual machine basis:
  1. Start esxtop by typing esxtop at the command line.
  2. Type v to switch to disk view (virtual machine mode).
  3. Press f to modify the fields that are displayed.
  4. Press b, d, e, h, and j to toggle the fields and press Enter.
  5. Press s, then 2 to alter the update time to every 2 seconds and press Enter.
  6. See Analyzing esxtop columns for a description of relevant columns
Analyzing esxtop columns:

CMDS/s : This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.In most cases CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)

DAVG/cmd :This is the average response time in milliseconds per command being sent to the device

KAVG/cmd :This is the amount of time the command spends in the VMkernel

GAVG/cmd :This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG

If the response time increases to over 5000 ms (or 5 seconds), VMware ESX will time out the command and abort the operation. These events are logged; abort messages and other SCSI errors can be reviewed in the following logs:

  • ESX 3.5 and 4.x – /var/log/vmkernel
  • ESXi 3.5 and 4.x – /var/log/messages 
  • ESXi 5.x - /var/log/vmkernel.log

No comments: