Monday, November 28, 2016

vSphere (and Some Other Products) Upgrade Notes



Recently, a lot of my customer are planning, doing, or have just done vSphere upgrade. Mostly due to vSphere 5.1 which already in end of general support phase per 24 August 2016. Technical guidance will still be provided for vSphere 5.1 until 24 August 2018 (For a complete important date on your product support phase, please check this VMware product lifecycle matrix.), but please note that no more security patches or bug fixes will be released for vSphere 5.1 in the future, unless stated otherwise. Other than that, during technical guidance phase, support request will only be given to low-severities issues on supported configuration only as stated in this VMware lifecycle policies. This is my personal notes on some information which can help in planning VMware environment upgrade.

Monday, November 14, 2016

Why Guest OS Task Manager is Showing Different Value Compare to vSphere Performance Monitor?

Demystifying CPU States in vCPU World


Have you experienced a situation where your guest OS task manager is showing different value compare to vSphere performance monitor? Or you get a request for additional vCPU from the application team which uses your VM because they see their VM utilizing almost all vCPU they have, but when you check vCPU usage of that VM in your vSphere web client, it only shows low utilization? Is there something wrong? Before you think there's something wrong with vSphere performance monitor, read this article to understand what's causing that situation.

Figure 1. Windows task manager shows ~100% CPU Utilization
Before going further, let me first describe the situation clearer. Figure 1 and 2 are coming from the same Virtual Machine, perf-worker-01b. The first figure shows Windows Task Manager where the CPU utilization is hitting 100% for most of the last 4 minutes. The second figure shows vSphere performance monitor which taken about the same time as Figure 1, and this figure reveals that for the last 4 minutes, VM CPU usage is only around 50%. FYI, I actually ran a CPU benchmark tool on perf-worker-01b for about 30 minutes, and for most of the time in that period, Windows task manager showed 100% CPU utilization, while vSphere performance monitor showed around 50% CPU usage. Why vSphere performance monitor only showed 50% CPU usage when Windows task manager showed ~100%?

Monday, November 7, 2016

Quickly Identify Whether My Virtual Machines Get All CPU Resources They Need

Background

One of the capability brings by virtualization is the ability to run several virtual machines in one physical machine. This ability may lead to something called over provisioned, where we provisioned resources to VMs more than what we have in the physical layer. For instance, we can create 20 VMs, where each has 4 vCPUs - in total of 80 vCPUs provisioned, while the server we used only has 20 CPU cores. Wait.. wait.... If we only has 20, how can we give 80? How can we give more than what we actually had? Actually the answers is one of the reason why virtualization rose in the first place: most of our server has - in average - low CPU utilization and each server has different time in experiencing peak and low utilization. VMware vSphere manages how VMs get their turn utilizing physical CPU resources in efficient and fair manner by a component called CPU Scheduler. In simple word, CPU Scheduler is like traffic light. It rules who may go, or in this case who may use the physical CPU resources, and who need to stop and wait. More about CPU Scheduler can be found on this CPU Scheduler Technical Whitepaper.

Using the analogy of traffic light, we know that at one time, the number of cars can go will be defined by numbers of lanes available. If the road has 4 lanes, then only maximum of 4 cars can pass at the same time, other cars will queue behind the first row. This is also true in virtualization, even though we can do over provisioning, what CPU scheduler can schedule at a time will be limited to how many  logical CPUs available on the physical server. Means, if at any one time there are several VMs, with total vCPUs more than available logical CPUs, asking for their share to use logical CPUs, then some of those VMs will need to queue. By having to queue, it will takes longer for a VM to finish its job. Now the challenge is how to identify this queue, and furthermore how to manage that queue into an acceptable timeframe. This article will try to answer the first part, while for the latter will be discussed in the future article.