SharePoint Virtualization Performance (and Why Less Is More)

SharePoint virtualization performance is not a new topic, but sizing the virtual resources can be an interesting task.  I was recently working with a client on their SharePoint 2010 environment and we were preparing it to take on an additional workload by installing Project Server.  One of the key discoveries was that their virtual machines were sized to only use one CPU core, and our performance testing indicated that this was a performance bottleneck.

The Microsoft hardware and software requirements for SharePoint 2010are quite clear that the minimum recommendation is to utilise 4 CPU cores.  It is so clear that they also state that if you are not using the recommended minimum requirements that your environment is in an unsupported state.  Despite this very explicit guidance, the client’s infrastructure team were opposed to utilising 4 CPU cores, and only wanted to allocate 2 CPU cores.

The infrastructure team’s main reasoning was that adding additional cores would cause performance issues in their production environment.  The rational was that with four cores, the hypervisor (in this case VMWare ESX) would have to wait for four cores to become available before any CPU tasks could be executed, effectively degrading performance of the virtual machines. (Their team had reference to an article entitled Virtual CPUs – The overprovisioning penalty of vCPU to pCPU ratios) .  From a SharePoint perspective, this put us in a dilemma as we wanted the environment to be in a  supported state.  Our other concern was overall SharePoint performance.  For this we conducted some benchmarking exercises in the client’s test environment.

 

SharePoint Performance Testing

SharePoint performance testing can be a bit of an art form.  I do not want to detract from the main focus on virtual machine sizing, so wish to summarise some of the key pieces of information about their environment:

  • SharePoint topology
    • 2 x SharePoint servers accepting web requests
    • 2 x SharePoint application tier servers
    • 1 x SQL database server, in process of re-engineering
  • From their existing environment (primarily a publishing intranet), it was determined that they had up to 200 unique users per hour, and this was our initial testing aim for concurrency – 200 users per SharePoint web server.  This also served as a basis for the modelling of the testing scenarios.
  • There were multiple scenarios tested, but for simplicity I am going to focus on side-by-side comparisons for using  1 vCPU, 2 vCPU and 4 vCPU configurations of the SharePoint servers responding to web requests.
  • This client have a number of number of new initiatives in the pipeline (eg collaboration sites, document management), and these are very difficult to estimate for usage and not considered in our tests.
  • Other testing conducted with 4 vCPUs found that 350 users per web server to be the sweet spot before performance degrades (total 700 concurrent users).  This equates to about 40% concurrency for their expected staffing levels.

 

Performance Testing Metrics

Performance testing can mean a lot of things to a lot of different people.  The testing we had been performing was around two main criteria:

  1. Is the system available to do my task?
  2. How long does it take to perform my task?

From this we devised three simple metrics to rate how the SharePoint environment was performing:

  • Requests per second as a measure of system throughput (we were still interested in system performance).
  • Average response time as a measure of system calculated end user experience.
  • Observed performance, how the system actually performed while under load.

The first two metrics are machine calculated, empirical results of the testing conducted.  The third metric was conducted by a human to measure the page load time, while the system was under the simulated load.

In the test cases below, we used our initial testing aim of 200 users per server to compare the variation with CPU cores.

The first two metrics were calculated from the Web Capacity Analysis Toolkit and the third metric was measured using Fiddler.

 

Metric 1 – Requests per second (system throughput)

SharePoint Virtualisation Requests per Second

The system throughput shows remarkable differences between 1, 2 and 4 CPU core configurations.  Jumping from 1 to 2 CPU cores is a 47% improvement in throughput, and from 2 to 4 cores is another 53% improvement.  Another observation was that VMWare reported 100% CPU usage with a single core.  When reviewing the system under load with 1 CPU core, it was clear the system was struggling.  Not only was the IIS worker process maxing out, but the SharePoint timer service was trying to execute on a regular basis and even the operating system was struggling to be able to schedule tasks to be processed.

On the other hand, with 2 or 4 CPU cores, the CPU usage did not max out at 100%.  The CPU levels barely rose above 70%, a strong indicator that an absolute minimum of 2 CPU cores are required.  This gets even more interesting when we start looking at response times.

 

Metric 2 – Average response time (calculated end user experience)

SharePoint Virtualisation Average Response Time

System throughput is nice to measure, as an engineer it gives us some numbers to show whether a system has improved or declined in performance.  The only problem is that end users are not concerned with these numbers, a more humane was to think about system performance is with response times, the time it takes to request an object from SharePoint.  The results shown here are more profound than the first metric of requests per second

With a single CPU core, the system calculated an average response time of 2000ms.  That equates to 2 seconds per request.  This may sounds relatively quick, so lets put that into perspective.

  • Average response time is: 2 seconds
  • Number of objects on a page (assuming an empty client cache): 88
  • Average time to load the page: 2 minutes 56 seconds

Three minutes to load a page!  That is definitely not an indication of good performance.  Lets do that exercise again for 2 and 4 cores with the same number of objects on the page:

[table]2 CPU cores, 4CPU cores

Average response time: 985ms, Average response time: 255ms

Average time to load the page: 1 minute 26 seconds,Average time to load the page: 22 seconds

[/table]

Using 2 CPU cores is a vast improvement, and 4 cores makes a huge difference, however  22 seconds at a best case is still a long time to load a page.  In the example page I’ve presented, with a populated cache the number of objects to retrieve drops down to less than 10, and in all practicality should be quicker.  Testing of a full page load on an empty cache is an example of an edge case, but not entirely unreasonable to expect from time to time.

This metric gives a good estimate on the performance to expect from the perspective of a machine, so now it is time to look at how the system actually performed from the point of view of a user.

 

Metric 3 – Observed performance

SharePoint Virtualisation Total Execution Time

Automating load tests are a great way to gauge and measure performance.  The only issue is that these are machine perceived metrics, not human perceived metrics.  To gauge how the system would actually perform under load, the same steps in the scenario were conducted in the browser.  Then using Fiddler, it was possible to measure how long it took to perform each task.

In the model for these performance tests, the time to load all 9 steps without any activity on the server should take about 15 seconds.  This is represented by the green line.  This time is the measurement of elapsed time from loading a page to then end of the last object to be loaded.  Human interaction or wait times have been removed from this metric.

With a single CPU, the time to load the complete scenario was 3 minutes 52 seconds, an average of 25 seconds per task (load a page, open a PDF, perform a search etc).  I would classify this as an unusable system.

For a 2 CPU environment, the total time was 1 minute 45 seconds, an average of 12 seconds per task.  A much better improvement.  If this was the time during peak concurrency, this may be somewhat acceptable, but probably still frustrating.

With 4 CPUs configured, the total time drops down to 23 seconds, an average of 2.5 seconds per task. With this configuration, the system should be relatively quick to use.

 

Conclusions from Performance Testing

The results gives us some confidence that there is justification in using 4 CPU cores on each server.  Looking purely at the numbers it is fairly clear that 4 CPU cores is the way to go, as Microsoft had intended.

I should also point out that when looking at the performance metrics from VMWare, I didn’t notice any issues around RAM, disk or network for any of the servers in the farm.  I expected that the dedicated SQL server would be serving as a bottleneck, but it was predominately CPU processing on the SharePoint web servers that stood out as an issue.

 

But…

I have access to a number of virtualization specialists in my organisation, so I put it out them in email the work I was doing and asked for their options on 2 vs 4 CPU cores.  What I received was an education on how hypervisors like VMWare and Hyper-V manage CPU resources.  Here is a summary of this wisdom:

  • Practical experience of the virtualization specialists in other systems shows that performance can be increased by decreasing 4 CPU core systems down to 2 CPU core systems, as the virtual machine does not need to wait until 4 CPU cores are allocated to it.
  • Using more than 1 CPU core will not provide any benefit unless you are running a multi-threaded application (and SharePoint certainly takes advantage of that).
  • The approach to take is to right-size the VMs from the outset, with the potential to scale-up (add virtual resources) or scale out (add SharePoint servers).  Unlike a physical server, a virtual machines requirements may and can change over time.  It is likely to have evolving resource requirements during its life-cycle.
  • Testing and benchmarking can give an estimate of performance behaviour, and even this will not be accurate.  It should be noted that the tests were performed in a non-production environment, and their was no CPU contention, unlike a production environment.
  • Maybe it is time to take a reality check on Microsoft minimum specifications, but it is difficult to ignore what is prescribed in black and white.  Myself and others in our team have been bitten by “non-compliance” in the past.  If you have a serious issue and you need to call Microsoft, you will need to have your SharePoint servers set with a minimum of 4 CPU cores.

Essentially everything I learnt confirmed the client’s beliefs that 2 CPU cores will perform better than 4 cores.  A complete mind shift on everything I previously thought and understood about SharePoint sizing and virtualisation.

 

Client Implementation

With all the new knowledge on virtualisation and the performance results in hand, I had a sit down meeting with the client’s infrastructure manager to discuss how to move forward.  Their SharePoint team and I were convinced that 4 CPU cores was what we will need at some stage in the future, and their infrastructure manager did not dispute that either.  This client is showing low usage patterns presently, and it is expected that adoption will be increasing over the next six months.  Ultimately we all agreed that 2 CPU cores was better than 1 CPU core, and we would assign each of the SharePoint servers with 2 CPU cores.  This was on the proviso that when we start to see measurable performance issues that we would revisit this topic and review the need to scale up (add CPU cores) or scale out (add an additional SharePoint web server).

Before I finish up I just want to call out that the above scenarios were conducted for a particular client with a particular workload modelled against their particular environment.  We expect the performance results to change when their workload or virtual infrastructure changes.  If you are using the performance results for any guidance, expect this to be different depending on your own workload and usage patterns.  I hope to explore more on the topic of performance benchmarking in a future post.