Get Instant Access
to This Blueprint

Infrastructure Operations icon

Develop an Availability and Capacity Management Plan

Manage capacity to increase uptime and reduce costs.

  • It is crucial for capacity managers to provide capacity in advance of need to maximize availability.
  • In an effort to ensure maximum uptime, organizations are overprovisioning (an average of 59% for compute, and 48% for storage). With budget pressure mounting (especially on the capital side), the cost of this approach can’t be ignored.
  • Half of organizations have experienced capacity-related downtime, and almost 60% wait more than three months for additional capacity.

Our Advice

Critical Insight

  • All too often capacity management is left as an afterthought. The best capacity managers bake capacity management into their organization’s business processes, becoming drivers of value.
  • Communication is key. Build bridges between your organization’s silos, and involve business stakeholders in a dialog about capacity requirements.

Impact and Result

  • Map business metrics to infrastructure component usage, and use your organization’s own data to forecast demand.
  • Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.
  • Establish infrastructure as a driver of business value, not a “black hole” cost center.

Develop an Availability and Capacity Management Plan Research & Tools

Start here – read the Executive Brief

Read our concise Executive Brief to find out why you should build a capacity management plan, review Info-Tech’s methodology, and understand the four ways we can support you in completing this project.

1. Conduct a business impact analysis

Determine the most critical business services to ensure availability.

2. Establish visibility into core systems

Craft a monitoring strategy to gather usage data.

3. Solicit and incorporate business needs

Integrate business stakeholders into the capacity management process.

4. Identify and mitigate risks

Identify and mitigate risks to your capacity and availability.


Member Testimonials

After each Info-Tech experience, we ask our members to quantify the real-time savings, monetary impact, and project improvements our research helped them achieve. See our top member experiences for this blueprint and what our clients have to say.

8.0/10


Overall Impact

$2,840


Average $ Saved

10


Average Days Saved

Client

Experience

Impact

$ Saved

Days Saved

Cork County Council

Guided Implementation

8/10

$2,840

10

BlueAlly Technology Solutions, LLC

Guided Implementation

10/10

$10,000

5

Randolph Brooks Federal Credit Union

Guided Implementation

8/10

$2,231

2


Availability & Capacity Management

Please note: This course will be updated in July 2023.

Maximize the benefits of infrastructure monitoring investments by diagnosing & assessing transaction performance, from network to server to end-user interface.

This course makes up part of the Infrastructure & Operations Certificate.

Now Playing:
Academy: Availability & Capacity Management | Executive Brief

An active membership is required to access Info-Tech Academy
  • Course Modules: 4
  • Estimated Completion Time: 2-2.5 hours
  • Featured Analysts:
  • Darin Stahl, Sr. Research Director, Infrastructure & Operations Practice
  • Gord Harrison, SVP of Research and Advisory

Workshop: Develop an Availability and Capacity Management Plan

Workshops offer an easy way to accelerate your project. If you are unable to do the project yourself, and a Guided Implementation isn't enough, we offer low-cost delivery of our project workshops. We take you through every phase of your project and ensure that you have a roadmap in place to complete your project successfully.

Module 1: Conduct a Business Impact Analysis

The Purpose

  • Determine the most important IT services for the business.

Key Benefits Achieved

  • Understand which services to prioritize for ensuring availability.

Activities

Outputs

1.1

Create a scale to measure different levels of impact.

  • RTOs/RPOs
1.2

Evaluate each service by its potential impact.

  • List of gold systems
1.3

Assign a criticality rating based on the costs of downtime.

  • Criticality matrix

Module 2: Establish Visibility Into Core Systems

The Purpose

  • Monitor and measure usage metrics of key systems.

Key Benefits Achieved

  • Capture and correlate data on business activity with infrastructure capacity usage.

Activities

Outputs

2.1

Define your monitoring strategy.

  • RACI chart
2.2

Implement your monitoring tool/aggregator.

  • Capacity/availability monitoring strategy

Module 3: Develop a Plan to Project Future Needs

The Purpose

  • Determine how to project future capacity usage needs for your organization.

Key Benefits Achieved

  • Data-based, systematic projection of future capacity usage needs.

Activities

Outputs

3.1

Analyze historical usage trends.

  • Plan for soliciting future needs
3.2

Interface with the business to determine needs.

3.3

Develop a plan to combine these two sources of truth.

  • Future needs

Module 4: Identify and Mitigate Risks

The Purpose

  • Identify potential risks to capacity and availability.
  • Develop strategies to ameliorate potential risks.

Key Benefits Achieved

  • Proactive approach to capacity that addresses potential risks before they impact availability.

Activities

Outputs

4.1

Identify capacity and availability risks.

  • List of risks
4.2

Determine strategies to address risks.

  • List of strategies to address risks
4.3

Populate and review completed capacity plan.

  • Completed capacity plan

Develop an Availability and Capacity Management Plan

Manage capacity to increase uptime and reduce costs.

ANALYST PERSPECTIVE

The cloud changes the capacity manager’s job, but it doesn’t eliminate it.

"Nobody doubts the cloud’s transformative power. But will its ascent render “capacity manager” an archaic term to be carved into the walls of datacenters everywhere for future archaeologists to puzzle over? No. While it is true that the cloud has fundamentally changed how capacity managers do their jobs , the process is more important than ever. Managing capacity – and, by extent, availability – means minimizing costs while maximizing uptime. The cloud era is the era of unlimited capacity – and of infinite potential costs. If you put the infinity symbol on a purchase order… well, it’s probably not a good idea. Manage demand. Manage your capacity. Manage your availability. And, most importantly, keep your stakeholders happy. You won’t regret it."

Jeremy Roberts,

Consulting Analyst, Infrastructure Practice

Info-Tech Research Group

Availability and capacity management transcend IT

This Research Is Designed For:

    ✓ CIOs who want to increase uptime and reduce costs

    ✓ Infrastructure managers who want to deliver increased value to the business

    ✓ Enterprise architects who want to ensure stability of core IT services

    ✓ Dedicated capacity managers

This Research Will Help You:

    ✓ Develop a list of core services

    ✓ Establish visibility into your system

    ✓ Solicit business needs

    ✓ Project future demand

    ✓ Set SLAs

    ✓ Increase uptime

    ✓ Optimize spend

This Research Will Also Assist:

    ✓ Project managers

    ✓ Service desk staff

This Research Will Help Them:

    ✓ Plan IT projects

    ✓ Better manage availability incidents caused by lack of capacity

Executive summary

Situation

  • IT infrastructure leaders are responsible for ensuring that the business has access to the technology needed to keep the organization humming along. This requires managing capacity and availability.
  • Dependencies go undocumented. Services are provided on an ad hoc basis, and capacity/availability are managed reactively.

Complication

  • Organizations are overprovisioning an average of 59% for compute, and 48% for storage. This is expensive. With budget pressure mounting, the cost of this approach can’t be ignored.
  • Lead time to respond to demand is long. Half of organizations have experienced capacity-related downtime, and almost 60% wait 3+ months for additional capacity. (451 Research, 3)

Resolution

  • Conduct a business impact analysis to determine which of your services are most critical, and require active capacity management that will reap more in benefits than it produces in costs.
  • Establish visibility into your system. You can’t track what you can’t see, and you can’t see when you don’t have proper monitoring tools in place.
  • Develop an understanding of business needs. Use a combination of historical trend analyses and consultation with line of business and project managers to separate wants from needs. Overprovisioning used to be necessary, but is no longer required.
  • Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.

Info-Tech Insight

  1. Components are critical. The business doesn’t care about components. You, however, are not so lucky…
  2. Ask what the business is working on, not what they need. If you ask them what they need, they’ll tell you – and it won’t be cheap. Find out what they’re going to do, and use your expertise to service those needs.
  3. Cloud shmoud. The role of the capacity manager is changing with the cloud, but capacity management is as important as ever.

Save money and drive efficiency with an effective availability and capacity management plan

Overprovisioning happens because of the old style of infrastructure provisioning (hardware refresh cycles) and because capacity managers don’t know how much they need (either as a result of inaccurate or nonexistent information).

According to 451 Research, 59% of enterprises have had to wait 3+ months for new capacity. It is little wonder, then, that so many opt to overprovision. Capacity management is about ensuring that IT services are available, and with lead times like that, overprovisioning can be more attractive than the alternative. Fortunately there is hope. An effective availability and capacity management plan can help you:

  • Identify your gold systems
  • Establish visibility into them
  • Project your future capacity needs

Balancing overprovisioning and spending is the capacity manager’s struggle.

Availability and capacity management go together like boots and feet

Availability and capacity are not the same, but they are related and can be effectively managed together as part of a single process.

If an IT department is unable to meet demand due to insufficient capacity, users will experience downtime or a degradation in service. To be clear, capacity is not the only factor in availability – reliability, serviceability, etc. are significant as well. But no organization can effectively manage availability without paying sufficient attention to capacity.

"Availability Management is concerned with the design, implementation, measurement and management of IT services to ensure that the stated business requirements for availability are consistently met."

– OGC, Best Practice for Service Delivery, 12

"Capacity management aims to balance supply and demand [of IT storage and computing services] cost-effectively…"

– OGC, Business Perspective, 90

Integrate the three levels of capacity management

Successful capacity management involves a holistic approach that incorporates all three levels.

Business The highest level of capacity management, business capacity management, involves predicting changes in the business’ needs and developing requirements in order to make it possible for IT to adapt to those needs. Influx of new clients from a failed competitor.
Service Service capacity management focuses on ensuring that IT services are monitored to determine if they are meeting pre-determined SLAs. The data gathered here can be used for incident and problem management. Increased website traffic.
Component Component capacity management involves tracking the functionality of specific components (servers, hard drives, etc.), and effectively tracking their utilization and performance, and making predictions about future concerns. Insufficient web server compute.

The C-suite cares about business capacity as part of the organization’s strategic planning. Service leads care about their assigned services. IT infrastructure is concerned with components, but not for their own sake. Components mean services that are ultimately designed to facilitate business.

A healthcare organization practiced poor capacity management and suffered availability issues as a result

CASE STUDY

Industry: Healthcare

Source: Interview

New functionalities require new infrastructure

There was a project to implement an elastic search feature. This had to correlate all the organization’s member data from an Oracle data source and their own data warehouse, and pool them all into an elastic search index so that it could be used by the provider portal search function. In estimating the amount of space needed, the infrastructure team assumed that all the data would be shared in a single place. They didn’t account for the architecture of elastic search in which indexes are shared across multiple nodes and shards are often split up separately.

Beware underestimating demand and hardware sourcing lead times

As a result, they vastly underestimated the amount of space that was needed and ended up short by a terabyte. The infrastructure team frantically sourced more hardware, but the rush hardware order arrived physically damaged and had to be returned to the vendor.

Sufficient budget won’t ensure success without capacity planning

The project’s budget had been more than sufficient to pay for the extra necessary capacity, but because a lack of understanding of the infrastructure impact resulted in improper forecasting, the project ended up stuck in a standstill.

Manage availability and keep your stakeholders happy

If you run out of capacity, you will inevitably encounter availability issues like downtime and performance degradation . End users do not like downtime, and neither do their managers.

There are three variables that are monitored, measured, and analyzed as part of availability management more generally (Valentic).

  1. Uptime:
  2. The availability of a system is the percentage of time the system is “up,” (and not degraded) which can be calculated using the following formula: uptime/(uptime + downtime) x 100%. The more components there are in a system, the lower the availability, as a rule.

  3. Reliability:
  4. The length of time a component/service can go before there is an outage that brings it down, typically measured in hours.

  5. Maintainability:
  6. The amount of time it takes for a component/service to be restored in the event of an outage, also typically measured in hours.

Enter the cloud: changes in the capacity manager role

There can be no doubt – the rise of the public cloud has fundamentally changed the nature of capacity management.

Features of the public cloud Implications for capacity management
Instant, or near-instant, instantiation Lead times drop; capacity management is less about ensuring equipment arrives on time.
Pay-as-you go services Capacity no longer needs to be purchased in bulk. Pay only for what you use and shut down instances that are no longer necessary.
Essentially unlimited scalability Potential capacity is infinite, but so are potential costs.
Offsite hosting Redundancy, but at the price of the increasing importance of your internet connection.

Vendors will sell you the cloud as a solution to your capacity/availability problems

The image contains two graphs. The first graph on the left is titled: Reactive Management, and shows the struggling relationship between capacity and demand. The second graph on the right is titled: Cloud future (ideal), which demonstrates a manageable relationship between capacity and demand over time.

Traditionally, increases in capacity have come in bursts as a reaction to availability issues. This model inevitably results in overprovisioning, driving up costs. Access to the cloud changes the equation. On-demand capacity means that, ideally, nobody should pay for unused capacity.

Reality check: even in the cloud era, capacity management is necessary

You will likely find vendors to nurture the growth of a gap between your expectations and reality. That can be damaging.

The cloud reality does not look like the cloud ideal. Even with the ostensibly elastic cloud, vendors like the consistency that longer-term contracts offer. Enter reserved instances: in exchange for lower hourly rates, vendors offer the option to pay a fee for a reserved instance. Usage beyond the reserved will be billed at a higher hourly rate. In order to determine where that line should be drawn, you should engage in detailed capacity planning. Unfortunately, even when done right, this process will result in some overprovisioning, though it does provide convenience from an accounting perspective. The key is to use spot instances where demand is exceptional and bounded. Example: A university registration server that experiences exceptional demand at the start of term but at no other time.

The image contains an example of cloud reality not matching with the cloud ideal in the form of a graph. The graph is split horizontally, the top half is red, and there is a dotted line splitting it from the lower half. The line is labelled: Reserved instance ceiling. In the bottom half, it is the colour green and has a curving line.

Use best practices to optimize your cloud resources

The image contains two graphs. The graph on the left is labelled: Ineffective reserve capacity. At the top of the graph is a dotted line labelled: Reserved Instance ceiling. The graph is measuring capacity requirements over time. There is a curved line on the graph that suddenly spikes and comes back down. The spike is labelled unused capacity. The graph on the right is labelled: Effective reserve capacity. The reserved instance ceiling is about halfway down this graph, and it is comparing capacity requirements over time. This graph has a curved line on it, also has a spike and is labelled: spot instance.

Even in the era of elasticity, capacity planning is crucial. Spot instances – the spikes in the graph above – are more expensive, but if your capacity needs vary substantially, reserving instances for all of the space you need can cost even more money. Efficiently planning capacity will help you draw this line.

Evaluate business impact; not all systems are created equal

Limited resources are a reality. Detailed visibility into every single system is often not feasible and could be too much information.

Simple and effective. Sometimes a simple display can convey all of the information necessary to manage critical systems. In cars it is important to know your speed, how much fuel is in the tank, and whether or not you need to change your oil/check your engine.

Where to begin?! Specialized information is sometimes necessary, but it can be difficult to navigate.

Take advantage of a business impact analysis to define and understand your critical services

Ideally, downtime would be minimal. In reality, though, downtime is a part of IT life. It is important to have realistic expectations about its nature and likelihood.

STEP 1

STEP 2

STEP 3

STEP 4

STEP 5

Record applications and dependencies

Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster.

Define impact scoring scale

Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications.

Estimate impact of downtime

Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application.

Identify desired RTO and RPO

Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss.

Determine current RTO/RPO

Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step.

Info-Tech Insight

According to end users, every system is critical and downtime is intolerable. Of course, once they see how much totally eliminating downtime can cost, they might change their tune. It is important to have this discussion to separate the critical from the less critical – but still important – services.

Establish visibility into critical systems

You may have seen “If you can’t measure it, you can’t manage it” or a variation thereof floating around the internet. This adage is consumable and makes sense…doesn’t it?

"It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth."

– W. Edwards Deming, statistician and management consultant, author of The New Economics

While it is true that total monitoring is not absolutely necessary for management, when it comes to availability and capacity – objectively quantifiable service characteristics – a monitoring strategy is unavoidable. Capturing fluctuations in demand, and adjusting for those fluctuations, is among the most important functions of a capacity manager, even if hovering over employees with a stopwatch is poor management.

Solicit needs from line of business managers

Unless you head the world’s most involved IT department (kudos if you do) you’re going to have to determine your needs from the business.

Do

Do not

✓ Develop a positive relationship with business leaders responsible for making decisions.

✓ Make yourself aware of ongoing and upcoming projects.

✓ Develop expertise in organization-specific technology.

✓ Make the business aware of your expenses through chargebacks or showbacks.

✓ Use your understanding of business projects to predict business needs; do not rely on business leaders’ technical requests alone.

X Be reactive.

X Accept capacity/availability demands uncritically.

X Ask line of business managers for specific computing requirements unless they have the technical expertise to make informed judgments.

X Treat IT as an opaque entity where requests go in and services come out (this can lead to irresponsible requests).

Demand: manage or be managed

You might think you can get away with uncritically accepting your users’ demands, but this is not best practice. If you provide it, they will use it.

The company meeting

“I don’t need this much RAM,” the application developer said, implausibly. Titters wafted above the assembled crowd as her IT colleagues muttered their surprise. Heads shook, eyes widened. In fact, as she sat pondering her utterance, the developer wasn’t so sure she believed it herself. Noticing her consternation, the infrastructure manager cut in and offered the RAM anyway, forestalling the inevitable crisis that occurs when seismic internal shifts rock fragile self-conceptions. Until next time, he thought.

"Work expands as to fill the resources available for its completion…"

– C. Northcote Parkinson, quoted in Klimek et al.

Combine historical data with the needs you’ve solicited to holistically project your future needs

Predicting the future is difficult, but when it comes to capacity management, foresight is necessary.

Critical inputs

In order to project your future needs, the following inputs are necessary.

  1. Usage trends: While it is true that past performance is no indication of future demand, trends are still a good way to validate requests from the business.
  2. Line of business requests: An understanding of the projects the business has in the pipes is important for projecting future demand.
  3. Institutional knowledge: Read between the lines. As experts on information technology, the IT department is well-equipped to translate needs into requirements.
The image contains a graph that is labelled: Projected demand, and graphs demand over time. There is a curved line that passes through a vertical line labelled present. There is a box on top of the graph that contains the text: Note: confidence in demand estimates will very by service and by stakeholder.

Follow best practice guidelines to maximize the efficiency of your availability and capacity management process

The image contains Info-Tech's IT Management & Governance Framework. The framework displays many of Info-Tech's research to help optimize and improve core IT processes. The name of this blueprint is under the Infrastructure & Operations section, and has been circled to point out where it is in the framework.
Develop an Availability and Capacity Management Plan preview picture

About Info-Tech

Info-Tech Research Group is the world’s fastest-growing information technology research and advisory company, proudly serving over 30,000 IT professionals.

We produce unbiased and highly relevant research to help CIOs and IT leaders make strategic, timely, and well-informed decisions. We partner closely with IT teams to provide everything they need, from actionable tools to analyst guidance, ensuring they deliver measurable results for their organizations.

MEMBER RATING

8.0/10
Overall Impact

$2,840
Average $ Saved

10
Average Days Saved

After each Info-Tech experience, we ask our members to quantify the real-time savings, monetary impact, and project improvements our research helped them achieve.

Read what our members are saying

What Is a Blueprint?

A blueprint is designed to be a roadmap, containing a methodology and the tools and templates you need to solve your IT problems.

Each blueprint can be accompanied by a Guided Implementation that provides you access to our world-class analysts to help you get through the project.

Need Extra Help?
Speak With An Analyst

Get the help you need in this 4-phase advisory process. You'll receive 6 touchpoints with our researchers, all included in your membership.

Guided Implementation 1: Conduct a business impact analysis
  • Call 1: Conduct a business impact analysis.

Guided Implementation 2: Establish visibility into core systems
  • Call 1: Discuss your monitoring strategy.

Guided Implementation 3: Solicit and incorporate business needs
  • Call 1: Develop a plan to gather historical data; set up plan to solicit business needs
  • Call 2: Evaluate data sources

Guided Implementation 4: Identify and mitigate risks
  • Call 1: Discuss possible risks and strategies for risk mitigation
  • Call 2: Review your capacity management plan

Authors

John Annand

Jeremy Roberts

Derek Shank

Contributors

  • Adrian Blant, Network and Capacity Authority, Vodafone
  • Brett Johnstone, Capacity Manager Cloud Services DCSG, Datacom
  • James Zhang, Senior Manager Disaster Recovery, AIG Technology
  • Mayank Banerjee, CTO, Global Supply Chain Management, HelloFresh
  • Mike Lynch, Capacity Manager, Telefónica
  • Paul Waguespack, Manager of Application Systems Engineering, Tufts Health Plan
  • Richie Mendoza, IT Consultant, SMITS Inc.
  • Rob Thompson, President, IT Tools & Process
  • Todd Evans, Capacity and Performance Management SME, IBM
Visit our IT Cost Optimization Center
Over 100 analysts waiting to take your call right now: 1-519-432-3550 x2019