Data Migration Project Checklist: a Template for Effective data migration planning

Data Migration Checklist: The Definitive Guide to Planning Your Next Data Migration

Coming up with a data migration checklist for your data migration project is one of the most challenging tasks, particularly for the uninitiated.

To help, I’ve compiled a list of ‘must-do’ activities that ​I’ve found to be essential to successful migrations.

It’s not a definitive list, you will almost certainly need to add more points but it’s a great starting point.​

Please critique it, extend it using the comments below, share it, but above all use it to ensure that you are fully prepared for the challenging road ahead.

TIP: Data Quality plays a pivotal role to this checklist so be sure to check out Data Quality Pro, our sister site with the largest collection of hands-on tutorials, data quality guides and expert support for Data Quality on the internet.

Get a free checklist kit: Project Planner Spreadsheet + MindMap

Serious about delivering a successful data migration?

Download the same checklist kit I use on client engagements and learn advanced tactics for data migration planning.

  • Project Planning Spreadsheet (for Excel/Google Sheets​)
  • Interactive Online MindMap (great for navigation)

Phase 1: Pre-Migration Planning

Have you assessed the viability of your migration with a pre-migration impact assessment?

Most data migration projects go barreling headlong into the main project without considering whether the migration is viable, how long it will take, what technology it will require and what dangers lie ahead.

It is advisable to perform a pre-migration impact assessment to verify the cost and likely outcome of the migration. The later you plan on doing this the greater the risk so score accordingly.

Have you based project estimates on guesswork or a more accurate assessment?

Don’t worry, you’re not alone, most projects are based on previous project estimates at best or optimistic guesswork at worst.

Once again, your pre-migration impact assessment should provide far more accurate analysis of cost and resource requirements so if you have tight deadlines, a complex migration and limited resources make sure you perform a migration impact assessment asap.

Have you made the business and IT communities aware of their involvement?

It makes perfect sense to inform the relevant data stakeholders and technical teams of their forthcoming commitments before the migration kicks off.

It can be very difficult to drag a subject matter expert out of their day job for a 2-3 hours analysis session once a week if their seniors are not onboard, plus by identifying what resources are required in advance you will eliminate the risk of having gaps in your legacy or target skillset.

In addition, there are numerous aspects of the migration that require business sign-off and commitment.Get in front of sponsors and stakeholders well in advance and ensure they understand AND agree to what their involvement will be.

Have you formally agreed the security restrictions for your project?

I have wonderful memories of one migration where we thought everything was in place so we kicked off the project and then was promptly shut down on the very first day.

We had assumed that the security measures we had agreed with the client project manager were sufficient, however we did not reckon on the corporate security team getting in on the action and demanding a far more stringent set of controls that caused 8 weeks of project delay.

Don’t make the same mistake, obtain a formal agreement from the relevant security governance teams in advance. Simply putting your head in the sand and hoping you won’t get caught out is unprofessional and highly risky given the recent loss of data in many organisations.

Have you identified your key data migration project resources and when they are required?

Don’t start your project hoping that will magically provision those missing resources you need.

I met a company several months ago who decided they did not require a lead data migration analyst because the “project plan was so well defined”. Suffice to say they’re now heading for trouble as the project spins out of control so make sure you understand precisely what roles are required on a data migration.

Also ensure you have a plan for bringing those roles into the project at the right time.

For example, there is a tendency to launch a project with a full contingent of developers armed with tools and raring to go. This is both costly and unnecessary. A small bunch of data migration, data quality and business analysts can perform the bulk of the migration discovery and mapping well before the developers get involved, often creating a far more successful migration.

So the lesson is to understand the key migration activities and dependencies then plan to have the right resources available when required.

Have you determined the optimal project delivery structure?

Data migrations do not suit a waterfall approach yet the vast majority of data migration plans I have witnessed nearly always resemble a classic waterfall design.

Agile, iterative project planning with highly focused delivery drops are far more effective so ensure that your overall plan is flexible enough to cope with the likely change events that will occur.

In addition, does your project plan have sufficient contingency? 84% of migrations fail or experience delay, are you confident that yours won’t suffer the same consequences?

Ensure you have sufficient capacity in your plan to cope with the highly likely occurrence of delay.

Do you have a well defined set of job descriptions so each member will understand their roles?

Project initiation will be coming at you like a freight train soon so ensure that all your resources know what is expected of them.

If you don’t have an accurate set of tasks and responsibilities already defined it means that you don’t know what your team is expected to deliver and in what order. Clearly not an ideal situation.

Map out the sequence of tasks, deliverables and dependencies you expect to be required and then assign roles to each activity. Check your resource list, do you have the right resources to complete those tasks?

This is an area that most projects struggle with so by clearly understanding what your resources need to accomplish will help you be fully prepared for the project initiation phase.

Have you created a structured task workflow so each member will understand what tasks are expected and in which sequence?

This is an extension of the previous point but is extremely important.

Most project plans will have some vague drop dates or timelines indicating when the business or technical teams require a specific release or activity to be completed.

What this will not show you is the precise workflow that will get you to those points. This needs to be ideally defined before project inception so that there is no confusion as you move into the initiation phase.

It will also help you identify gaps in your resourcing model where the necessary skills or budgets are lacking.

Have you created the appropriate training documentation and designed a training plan?

Data migration projects typically require a lot of additional tools and project support platforms to function smoothly.

Ensure that all your training materials and education tools are tested and in place prior to project inception.

Ideally you would want all the resources to be fully trained in advance of the project but if this isn’t possible at least ensure that training and education is factored into the plan.

Do you have a configuration management policy and software in place?

Data migration projects create a lot of resource materials. Profiling results, data quality issues, mapping specifications, interface specifications – the list is endless.

Ensure that you have a well defined and tested configuration management approach in place before project inception, you don’t want to be stumbling through project initiation trying to make things work, test them in advance first and create the necessary training materials.

Have you planned for a secure, collaborative working environment to be in place?

If your project is likely to involve 3rd parties and cross-organisational support it pays to use a dedicated product for managing all the communications, materials, planning and coordination on the project.

It will also make your project run smoother if this is configured and ready prior to project initiation.

Have you created an agreed set of data migration policy documents?

How will project staff be expected to handle data securely? Who will be responsible for signing off data quality rules? What escalation procedures will be in place?

There are a multitude of different policies required for a typical migration to run smoothly, it pays to agree these in advance of the migration so that the project initiation phase runs effortlessly.

Phase 2: Project Initiation

Have you created a stakeholder communication plan and stakeholder register?

During this phase you need to formalise how each stakeholder will be informed. We may well have created an overall policy beforehand but now we need to instantiate it with each individual stakeholder.

Don’t create an anxiety gap in your project, determine what level of reporting you will deliver for each type of stakeholder and get agreement with them on the format and frequency. Dropping them an email six months into the project that you’re headed for a 8 week delay will not win you any favours.

To communicate with stakeholders obviously assumes you know who they are and how to contact them! Record all the stakeholder types and individuals who will require contact throughout the project.

Have you tweaked and published your project policies?

Now is the time to get your policies completed and circulated across the team and new recruits.

Any policies that define how the business will be involved during the project also need to be circulated and signed off.

Don’t assume that everyone knows what is expected of them so get people used to learning about and signing off project policies early in the lifecycle.

Have you created a high-level first-cut project plan?

If you have followed best-practice and implemented a pre-migration impact assessment you should have a reasonable level of detail for your project plan. If not then simply complete as much as possible with an agreed caveat that the data will drive the project. I would still recommend carrying out a migration impact assessment during the initiation phase irrespective of the analysis activities which will take place in the next phase.

You cannot create accurate timelines for your project plan until you have analysed the data.

For example, simply creating an arbitrary 8 week window for “data cleansing activities” is meaningless if the data is found to be truly abysmal. It is also vital that you understand the dependencies in a data migration project, you can’t code the mappings until you have discovered the relationships and you can’t do that until the analysis and discovery phase has completed.

Also, don’t simply rely on a carbon copy of a previous data migration project plan, your plan will be dictated by the conditions found on the ground and the wider programme commitments that your particular project dictates.

Have you set up your project collaboration platform?

This should ideally have been created before project initiation but if it hasn’t now is the time to get it in place.

There are some great examples of these tools listed over at our sister community site here:

5 Simple Techniques To Differentiate Your Data Quality Service

Have you created your standard project documents?

During this phase you must create your typical project documentation such as risk register, issue register, acceptance criteria, project controls, job descriptions, project progress report, change management report, RACI etc.

They do not need to be complete but they do need to be formalised with a process that everyone is aware of.

Have you defined and formalised your 3rd Party supplier agreements and requirements?

Project initiation is a great starting point to determine what additional expertise is required.

Don’t leave assumptions when engaging with external resources, there should be clear instructions on what exactly needs to be delivered, don’t leave this too late.

Have you scheduled your next phase tasks adequately?

At this phase you should be meticulously planning your next phase activities so ensure that the business and IT communities are aware of the workshops they will be involved in.

Have you resolved any security issues and gained approved access to the legacy datasets?

Don’t assume that because your project has been signed off you will automatically be granted access to the data.

Get approvals from security representatives (before this phase if possible) and consult with IT on how you will be able to analyse the legacy and source systems without impacting the business. Full extracts of data on a secure, independent analysis platform is the best option but you may have to compromise.

It is advisable to create a security policy for the project so that everyone is aware of their responsibilities and the professional approach you will be taking on the project.

Have you defined the hardware and software requirements for the later phases?

What machines will the team run on? What software will they need? What licenses will you require at each phase? Sounds obvious, not for one recent project manager who completely forgot to put the order in and had to watch 7 members of his team sitting idly by as the purchase order crawled through procurement. Don’t make the same mistake, look at each phase of the project and determine what will be required.

Model re-engineering tools? Data quality profiling tools? Data cleansing tools? Project management software? Presentation software? Reporting software? Issue tracking software? ETL tools?

You will also need to determine what operating systems, hardware and licensing is required to build your analysis, test, QA and production servers. It can often take weeks to procure this kind of equipment so you ideally need to have done this even before project initiation.

Phase 3: Landscape Analysis

Have you created a detailed data dictionary?

A data dictionary can mean many things to many people but it is advisable to create a simple catalogue of all the information you have retrieved on the data under assessment. Make this tool easy to search, accessible but with role-based security in place where required. A project wiki is a useful tool in this respect.

Have you created a high-level source to target mapping specification?

At this stage you will not have a complete source-to-target specification but you should have identified the high-level objects and relationships that will be linked during the migration. These will be further analysed in the later design phase.

Have you determined high-level volumetrics and created a high-level scoping report?

It is important that you do not fall foul of the load-rate bottleneck problem so to prevent this situation ensure that you fully assess the scope and volume of data to be migrated.

Focus on pruning data that is historical or surplus to requirements (see here for advice). Create a final scoping report detailing what will be in scope for the migration and get the business to sign this off.

Has the risk management process been shared with the team and have they updated the risk register?

There will be many risks discovered during this phase so make it easy for risks to be recorded. Create a simple online form where anyone can add risks during their analysis, you can also filter them out later but for now we need to gather as many as possible and see where any major issues are coming from.

Have you created a data quality management process and impact report?

If you’ve been following our online coaching calls you will know that without a robust data quality rules management process your project will almost certainly fail or experience delays.

Understand the concept of data quality rules discovery, management and resolution so you deliver a migration that is fit for purpose.

The data quality process is not a one-stop effort, it will continue throughout the project but at this phase we are concerned with discovering the impact of the data so decisions can be made that could affect project timescales, deliverables, budget, resourcing etc.

Have you created and shared a first-cut system retirement strategy?

Now is the time to begin warming up the business to the fact that their beloved systems will be decommissioned post-migration. Ensure that they are briefed on the aims of the project and start the process of discovering what is required to terminate the legacy systems. Better to approach this now than to leave it until later in the project when politics may prevent progress.

Have you created conceptual/logical/physical and common models?

These models are incredibly important for communicating and defining the structure of the legacy and target environments.

The reason we have so many modelling layers is so that we understand all aspects of the migration from the deeply technical through to how the business community run operations today and how they wish to run operations in the future. We will be discussing the project with various business and IT groups so the different models help us to convey meaning for the appropriate community.

Creating conceptual and logical models also help us to identify gaps in thinking or design between the source and target environments far earlier in the project so we can make corrections to the solution design.

Have you refined your project estimates?

Most projects start with some vague notion of how long each phase will take. Use your landscape analysis phase to determine the likely timescales based on data quality, complexity, resources available, technology constraints and a host of other factors that will help you determine how to estimate the project timelines.

Phase 4: Solution Design

Have you created a detailed mapping design specification?

By the end of this phase you should have a thorough specification of how the source and target objects will be mapped, down to attribute level. This needs to be at a sufficient level to be passed to a developer for implementation in a data migration tool.

Note that we do not progress immediately into build following landscape analysis. It is far more cost-effective to map out the migration using specifications as opposed to coding which can prove expensive and more complex to re-design if issues are discovered.

Have you created an interface design specification?

At the end of this stage you should have a firm design for any interface designs that are required to extract the data from your legacy systems or to load the data into the target systems. For example, some migrations require change data capture functionality so this needs to be designed and prototyped during this phase.

Have you created a data quality management specification?

This will define how you plan to manage the various data quality issues discovered during the landscape analysis phase. These may fall into certain categories such as:

  • Ignore
  • Cleanse in source
  • Cleanse in staging process
  • Cleanse in-flight using coding logic
  • Cleanse on target

The following article by John Platten of Vivamex gives a better understanding on how to manage cleansing requirements: Cleanse Prioritisation for Data Migration Projects – Easy as ABC?

Have you defined your production hardware requirements?

At this stage you should have a much firmer idea of what technology will be required in the production environment.

The volumetrics and interface throughput performance should be known so you should be able to specify the appropriate equipment, RAID configurations, operating system etc.

Have you agreed the service level agreements for the migration?

At this phase it is advisable to agree with the business sponsors what your migration will deliver, by when and to what quality.

Quality, cost and time are variables that need to be agreed upon prior to the build phase so ensure that your sponsors are aware of the design limitations of the migration and exactly what that will mean to the business services they plan to launch on the target platform.

Phase 5: Build & Test

Has your build team documented the migration logic?

The team managing the migration execution may not be the team responsible for coding the migration logic.

It is therefore essential that the transformations and rules that were used to map the legacy and target environments are accurately published. This will allow the execution team to analyse the root-cause of any subsequent issues discovered.

Have you tested the migration with a mirror of the live environment?

It is advisable to test the migration with data from the production environment, not a smaller sample set. By limiting your test data sample you will almost certainly run into conditions within the live data that cause a defect in your migration at runtime.

Have you developed an independent migration validation engine?

Many projects base the success of migration on how many “fall-outs” they witness during the process. This is typically where an item of data cannot be migrated due to some constraint or rule violation in the target or transformation data stores. They then go on to resolve these fall-outs and when no more loading issues are found carry out some basic volumetric testing.

“We had 10,000 customers in our legacy system and we now have 10,000 customers in our target, job done”.

We recently took a call community member based in Oman. Their hospital had subcontracted a data migration to a company who had since completed the project. Several months after the migration project they discovered that many thousands of patients now had incomplete records, missing attributes and generally sub-standard data quality.

It is advisable to devise a solution that will independently assess the success of the execution phase. Do not rely on the reports and stats coming back from your migration tool as a basis for how successful the migration was.

I advise clients to vet the migration independently, using a completely different supplier where budgets permit. Once the migration project has officially terminated and those specialist resources have left for new projects it can be incredibly difficult to resolve serious issues so start to build a method of validating the migration during this phase, don’t leave it until project execution, it will be too late.

Have you defined your reporting strategy and associated technology?

Following on from the previous point, you need to create a robust reporting strategy so that the various roles involved in the project execution can see progress in a format that suits them.

For example, a migration manager may wish to see daily statistics, a migration operator will need to see runtime statistics and a business sponsor may wish to see weekly performance etc.

If you have created service level agreements for migration success these need to be incorporated into the reporting strategy so that you can track and verify progress against each SLA.

Have you defined an ongoing data quality monitoring solution?

Data quality is continuous and it should certainly not cease when the migration has been delivered as there can be a range of insidious data defects lurking in the migrated data previously undetected.

In addition, the new users of the system may well introduce errors through inexperience so plan for this now by building an ongoing data quality monitoring environment for the target platform.

A useful tool here is any data quality product that can allow you to create specific data quality rules, possesses matching functionality and also has a dashboard element.

Have you created a migration fallback policy?

What if the migration fails? How will you rollback? What needs to be done to facilitate this?

Hope for the best but plan for the worst case scenario which is an failed migration. This can often be incredibly complex and require cross-organisation support so plan well in advance of execution.

Have you confirmed your legacy decommission strategy?

By now you should have a clear approach, with full agreement, of how you will decommission the legacy environment following the migration execution.

Have you completed any relevant execution training?

The team running the execution phase may differ to those on the build phase, it goes without saying that the migration execution can be complex so ensure that the relevant training materials are planned for and delivered by the end of this phase.

Have you obtained sign-off for anticipated data quality levels in the target?

It is rare that all data defects can be resolved but at this stage you should certainly know what they are and what impact they will cause.

The data is not your responsibility however, it belongs to the business so ensure they sign off any anticipated issues so that they are fully aware of the limitations the data presents.

Have you defined the data migration execution strategy?

Some migrations can take a few hours, some can run into years.

You will need to create a very detailed plan for how the migration execution will take place. This will include sections such as what data will be moved, who will sign-off each phase, what tests will be carried out, what data quality levels are anticipated, when will the business be able to use the data, what transition measures need to be taken.

This can become quite a considerable activity so as ever, plan well in advance.

Have you created a gap-analysis process for measuring actual vs current progress?

This is particularly appropriate on larger scale migrations.

If you have indicated to the business that you will be executing the migration over an 8 week period and that specific deliverables will be created you can then map that out in an excel chart with time points and anticipated volumetrics.

As your migration executes you can then chart actual vs estimated so you can identify any gaps.

Phase 6: Execute & Validate

Have you kept an accurate log of SLA progress?

You will need to demonstrate to the business sponsors and independent auditors that your migration has been compliant. How you will do this varies but if you have agreed SLA’s in advance these need to be reported against.

Have you independently validated the migration?

Already covered this but worth stressing again that you cannot rely on your migration architecture to validate the migration. An independent process must be taken to ensure that the migration process has delivered the data to a sufficient quality level to support the target services.

Phase 7: Decommission & Monitor

Have you completed your system retirement validation?

There will typically be a number of pre-conditions that need to be met before a system can be terminated.

Ensure that these are fully documented and agreed (this should have been done earlier) so you can begin confirming that the migration has met these conditions.

Have you handed over ownership of the data quality monitoring environment?

Close down your project by passing over the process and technology adopted to measure data quality during the project.

Please note that this list is not exhaustive, there are many more activities that could be added here but it should provide you with a reasonable starting point.

You may also find that many of these activities are not required for your type of migration but are included for clarity, as ever, your migration is unique so will require specific actions to be taken that are not on this list.

source from:


Uptime vs. TIA-942: Introduction, why this series of articles?

Published on May 8, 2017

Edward van Leent
Chairman & CEO at EPI Group of Companies

Article 1 | Uptime vs. TIA-942: Introduction, why this series of articles?

During a recent one month tour throughout the USA and Asia I had the pleasure to meet numerous data centre owners/operators, consultants and end-users to talk about data centre trends and the challenges they are facing. During those conversations, we also discussed quality benchmarks for data centres facilities including the various standards and guidelines.

I started spotting a clear trend that there is a lot of misperception about data centre facilities benchmarking in relation to ANSI/TIA-942 vs. Uptime. Some of those misperceptions are based on outdated information as some customers didn’t keep up with the developments in that space as well as deception, created by some parties not representing the facts truthfully either by ignorance or intentionally for commercial reasons.

It was clear to me that the market needs to be updated on what is happening, this including the true facts of the matter. That’s what brought me to the idea of writing a few articles about this subject matter to ensure the market gets appropriate, fact based and updated information. I will address in a series of articles a variety of aspects and I hope that this will contribute to a more clear and fact based picture of the current situation and it will hopefully answer any question you might have regarding this subject matter. If you have any suggestions in terms of topics to be covered then please feel free to drop me a note at;

Article 2 | Uptime vs. TIA-942: A short history

Before getting into the details of Uptime vs. TIA-942, I thought it would be a good idea to provide a bit of background so that some of the matters that will be discussed in upcoming articles can be seen in the light of the bigger scheme of things.

Uptime (UTI) came up with the data centre classification scheme based on four (4) different levels which probably all readers of this article know are indicated by the term “Tier”. It was first released in 1995 with the title “Tier Classifications Define Site Infrastructure Performance”. In 2005, this title was update to “Tier Standard Topology” also referred to as TST.

In the early 2000’s the TR42 committee of TIA decided to create a telecommunication standard for data centres. UTI and TIA got in touch with each other and UTI gave TIA the legal right to use the Tier philosophy it had developed for inclusion into what ultimately became the ANSI/TIA-942 (TIA-942) standard. There were a few key differences such as that TIA-942 did not only address Electrical and Mechanical as defined at a high level in the TST, but also included many other factors in two additional sections being Architectural and Telecommunication (I will expand more on some of the key (technical) differences in another article). Both UTI and TIA were using the term Tier to indicate the four different levels of design principles. UTI was, and still is, using the Roman numerals (I, II, III, IV) whereas TIA was using the Arabic-Indic numerals (1,2,3,4).

TIA released the ANSI/TIA-942 standard in 2005. The standard very quickly became popular for a variety of reasons. This was amplified when a number of organizations started to perform conformity assessments based on the ANSI/TIA-942 which clearly was creating a much more competitive environment in the market place where previously UTI was pretty much the sole player. There was also some level of confusion in the market when organizations were talking about having a Tier-X data centre without providing the reference as to whether this claim was based on either UTI-TST or the ANSI/TIA-942. These reasons slowly became more and more of an irritation point and in 2013 UTI approached TIA with the request for TIA to drop the term ‘Tier’ from the ANSI/TIA-942 standard.

TIA, being a non-profit organization, had no issues with that and as such it was mutually agreed upon that TIA would strike of the term ‘Tier’ from the ANSI/TIA-942 standard and replace it with the term Rated/Rating in the 2014 version of the Standard. In an upcoming article, I will discuss in more detail about the rights of using the term Tier and/or Rated/Rating as there are unfortunately some misperceptions about the legal rights with respect to the usage of the term ‘Tier’.

The above episode basically ended the relation between UTI and TIA and each of the parties are now working individually on the current and future versions of their own independent documents.


Article 3 | Uptime vs. TIA-942: Standard or guideline?

There have been many debates on the internet to discuss this topic including the confusion about its relation to codes and arguments about using a capital letter to indicate the term Standard. I think it is good to go back to one of the first definitions (as far back as 1667) which defined a Standard as ‘a specified principle, example or measure used for comparison to a level of quality or attainment’.  A guideline was defined as ‘A non-specific rule or principle that provides direction to action, behaviour or outcome’.  These definitions of course still leave some level of interpretation about what exactly can be identified even to the point that some would argue that the both terms can be used for the very same thing. I would argue that a Standard has a few important factors;

  1. Standards are developed by an accredited SDO (Standard Development Organization). This title is awarded by any of the three key members of the WSC (World Standards Cooperation) or their regional or national members who have been given the authority to accredit SDO’s. At a regional level you would have for example CEN which is the European standards body issuing EN standards. At the country level you have for example ANSI in the USA, BSI for the UK, SPRING in Singapore etc. Virtually any country in the world has their own.
  2. The development of the Standard is following a transparent development process as laid down by the organization which is governing the SDO development efforts. This typically includes key points such as that the process should be documented and available for others, members involved should be balanced etc.
  3. SDO’s are typically non-profit organizations
  4. SDO’s do not perform audits nor do they provide certification
  5. All requirements of the standard are transparent i.e. ALL requirements are available to those who wish to have insight in the standard and, just before you even ask the question; NO, this does not mean that the standard should be available for free.
  6. The Standard must be reviewed on a regular basis not to exceed 5-years. The outcome of that review will yield in either one of the three options, reaffirm, revise, withdraw.
  7. The intellectual property (IP) extends only to the standard itself and not to its use. This means that others than the SDO can use the material for various purposes such as using it for developing a service or product that uses the IP of the Standard.

There is a variance to the above which are typically called de-facto/semi standards which are defined as specifications which are accepted by its relatively widely spread usage.

So how can one make sure that a standard is a real Standard? One can review it from a “legal” perspective or one could just apply the following logic;

  1. First of all, a real Standard would bear the prefix of the organization who accredited the SDO. for example, the long description of the TIA-942 is ANSI/TIA-942 which means that ANSI is overseeing TIA as an SDO to ensure that whatever they develop is following due process. Just to be clear, ANSI does not validate the content of the standard as this rests with the SDO and their technical committee of SME’s (Subject Matter Experts).
  2. A real Standards (typically) has a numeral indicator e.g. ISO-9001, TIA-942
  3. A real Standard is a document which provides a clear description of all audit criterion

Coming back to the main question and based on the explanation provided I believe it is very clear, and nobody can even argue, that ANSI/TIA-942 is a real Standard. UTI-TST is not a Standard but a guideline. At best, and with a fair amount of imagination, you could consider calling it a de-facto standard but anything beyond that statement clearly is a misrepresentation of the facts and the intent as how WSC and its members would define and recognize an SDO and a Standard.

Article 4 | Uptime vs. TIA-942: What is within the scope?

One of the key differences between UTI:TST and ANSI/TIA-942 is the scope. For the TST topology guideline of UTI the scope is very clear as it only covers the mechanical and electrical infrastructure. This is often seen as inadequate by data centers owners. As one of a data centre consultant once said to me “you could build a data centre in a wooden hut next to the railroad track and nuclear power plant with no fire suppression and the doors wide open and still be a Tier-IV data centre based on UTI:TST. As ridiculous as it might sound, the reality is that nobody could argue with this consultant as the UTI:TST only covers electrical and mechanical, full stop. Although electrical and mechanical systems are very important, it doesn’t make any sense to ignore all other aspects that would contribute to a reliable, secure and safe data centre.

For ANSI/TIA-942 the situation is slightly more complicated. Officially the standard is called “Telecommunications Infrastructure Standard for Data Centers”. There are a number of annexes in the ANSI/TIA-942  which describe additional criterion such as site location, building construction, electrical and mechanical infrastructure, physical security, safety, fire detection and suppression etc. So, one could easily figure out that ANSI/TIA-942 is clearly covering all aspects of a data center. So what is the issue?

There is a theoretical and practical side to this. Let’s start with the theoretical side first. The standard indicates in the introduction that the 8 annexes are not part of the requirements of the standard and as such the annexes start with the term ‘informative’. However, a few sentences later it states “It is intended for use by designers who need a comprehensive understanding of the data center design, including the facility planning, the cabling system, and the network design”. This indicates that the Technical Committee who put the standard together has a clear intent to cover the whole data centre and not just the network infrastructure alone. Furthermore, the standard also states that “Failsafe power, environmental controls and fire suppression, and system redundancy and security are also common requirements to facilities that serve both the private and public domain’. In addition to this, in Annex-F it states “This Standard includes four ratings relating to various levels of resiliency of the data center facility infrastructure”.  It is hard to ignore by the continues reference to the relation between telecommunications and facilities infrastructure that this should be taken as an overall design standard and not just for telecommunications alone.

Then we have the practical side of the matter which is that any data centre which is taking the ANSI/TIA-942 as their refence point does so by referring to Tier/Rating levels. I have never seen any data centre declaring conformity to the ANSI/TIA-942 which ignored all the annexes as by right one could then just pull the approved network cables in the right way and forget about all other aspects such as electrical and mechanical systems etc. The reality is that data centre operator/owners who are using the ANSI/TIA-942 standard as their reference point are using its full content, including the annexes and rating systems.

So, the conclusion is very simple. No matter how much confusion some parties try to throw into the mix, the reality is that data centre designers/operators/owners take the full document as their reference for designing and building a reliable, secure, efficient and safe data centre. Anybody who says that ANSI/TIA-942 is only used for telecommunications  is either ignoring what is happening in the real world or is just oblivious to the facts of how ANSI/TIA-942 is being written and/or used.

Article 5 | Uptime vs. TIA-942: Outcome based or checklist or can it be both?

In this article, in the series of articles about Uptime vs. TIA-942, I will address a statement often used in favour of Uptime vs. TIA-942. Consultants favouring Uptime are typically using the argument that they are not using a checklist but are assessing designs based on desired outcome. The claim is that ANSI/TIA-942 is not flexible and prevents innovation of designs as it is using a checklist i.e. tick in the box approach. So, let’s examine the true facts of these statements.

Checklists based:

First of all, UTI does have a checklist. However, it is an internal checklist which is used by their own engineers to go through designs in a systematic way. This checklist is not shared with the general public, even though it would be helpful for everybody to have it, in order to get a better understanding about the details of the UTI demonstration/test criteria. This goes back to one of my previous articles about what real standards are, i.e. open and transparent.

ANSI/TIA-942 is a combination of descriptions of what needs to be achieved to meet defined rating levels as well as supplemental annexes to provide guidance on how to achieve this. However, make no mistake, purely applying the table of annex-F as a checklist for conformity without considering the rest of the standard will give you an ugly surprise during an audit as the table is a supporting element to the standard, it is not intended to be a complete checklist for all requirements of the standard. This is a classic mistake of inexperienced consultants/auditors offering consulting/audit services and proudly pull out a copy of the table, putting a tick in every box and then declare a site to conform to ANSI/TIA-942.  These consultants/auditors have clearly not understood the standard and/or do not understand how audits should be conducted. Unfortunately in EPI we have seen data centre owners in “tears” when during an audit we found major non-conformities which were overlooked by these kinds of consultants. Be aware of whom you choose for consulting and audit engagements and make sure they apply the ANSI/TIA-942 appropriately.

Outcome based:

This applies to both UTI and ANSI/TIA-942. Don’t forget that ultimately the description of what constitutes to be a Tier:I-II-III-IV is exactly the same as what ANSI/TIA-942 describes as Rated:1-2-3-4.  For example, UTI:TST defines Tier-III as a Concurrently Maintainable (CM) data centre, similar to ANSI/TIA-942 defining a Rated-3 as a Concurrently Maintainable data centre. However, from the previous article you have learned that UTI:TST only covers Electrical and Mechanical (cooling) whereas ANSI/TIA-942 also includes requirements for Telecommunications to meet the CM requirement. A key difference is of course that ANSI/TIA-942 provides much more transparency and guidance on how this could be achieved, by giving clear indications of what one should/could do to achieve this.

Here is an example of how it works in the real world, which is very different than what is portrayed by consultants favouring Uptime. ANSI/TIA-942 states that for a Rated-3 data centre there should be 2 utility feeds which can come from a single substation. What about if you have only one utility feed? You could still meet Rated-3 if you can prove that you meet the overarching statement of being CM. So, if you have generators and you can prove that during planned maintenance you can switch to the generators then you could still be meeting Rated-3 requirements. Of course, there will be a number of other criteria which you will need to address to ensure that the generator is capable of continuous support of the load over extended period of time etc. but in essence you certainly can meet Rated-3 despite not having followed the exact word in the table that says you need two utility feeds. Inexperienced consultants/auditors do not understand this, Certified consultants/auditors will understand this so make sure you put your design work in capable hands. Consultants favouring Uptime often will try to scare the customer with the “you will never be able to comply to ANSI/TIA-942 as the table tells you that you must have XYZ”. It is kind of hilarious to see that the same type of consultants declare that the annexes are not part of the ANSI/TIA-942 standard but yet try to scare a customer of not meeting the items listed in the very same table they say are not part of the standard…

So, coming back to the question of “Outcome based or checklist or can it be both?”  UTI is outcome based and does not provide practical guidance on how to achieve it. ANSI/TIA-942 is also outcome based but provides guidance by means of clear descriptions in various annexes and a supplemental table for you to use. Those with more advanced technical skills can still use the flexibility to implement the design differently as long as they meet the outcome objectives and guidance of the annexes. Therefore, there is absolutely no truth to the statement that ANSI/TIA-942 hinders innovation when designing data centres.

Feel free to share this range of articles to other LinkedIn groups, friends and other social media.

In my next article, I will address the often-heard misconception of “Uptime is easy, TIA is hard”

LikedUnlikeUptime vs. TIA-942: Outcome based or checklist or can it be both?CommentShareShare Uptime vs. TIA-942: Outcome based or checklist or can it be both?


Article 6 | Uptime vs. TIA-942: Uptime certification is easy, ANSI/TIA-942 certification is difficult

Following up on my previous article, we will now have a closer look at a statement which consultants favouring Uptime tend to throw at data centre operator/owners trying to convince them to go for Uptime certification instead of ANSI/TIA-942 certification. One of the famous statements is “Uptime certification is easier to achieve compared to TIA-942”.

When hearing this, the first thought that comes to my mind is what my late father always said, ‘If something is too easy to achieve, it is probably not worth it’. Having said that, I don’t think that Uptime certification is all that easy to comply with. So why is it that such statements are being made?

At the core of those using such statements is of course a commercial interest by trying to scare data centre owners/operators from pursuing the ANSI/TIA-942 certification. So, what are their justifications and are these true or false? The arguments usually brought up in those conversations are;

  1. UTI:TST only reviews electrical and mechanical (cooling) systems whereas ANSI/TIA-942 is too complicated as it covers everything including telecommunications, physical security etc.
  2. ANSI/TIA-942 is prescriptive and has many strict requirements in the Table which are hard to implement
  3. If you fail to meet one of the ANSI/TIA-942 requirement then you cannot get certified

Let’s have a look at each of these statements one by one to decipher the truth;


Argument-1; UTI:TST only reviews electrical and mechanical;

There are two items to be examined here;

  1. The scope of the audit, in this case being electrical and mechanical
  2. The difficulty in meeting the criteria for the defined scope

As for the scope, yes it is true that the smaller the scope is, the less will be assessed and therefore potentially less issues might be discovered. But to me that sounds like saying that you have a safe car because your seatbelts are certified but you didn’t look at the tires, the structural strength of the car and other factors that have an impact on the overall safety. Similar to a data centre, you can review only electrical and mechanical systems like UTI:TST but if you don’t review the network, physical security and other factors then you still have a very large risk at hand from an overall data centre reliability perspective. So, as a business manager running an enterprise or commercial data centre, or as a user of a commercial data centre, would you be happy to know that a certificate is not covering all aspects that potentially poses a business risk to you? A broader scope like what is in the ANSI/TIA-942 will ensure that all potential physical risks are evaluated.

As for the difficulty of meeting the criteria for the defined scope, in this respect UTI is certainly not easier compared to ANSI/TIA-942. In fact, some of the requirements from UTI are considered to be more difficult; an example is the requirement for prime generators whereas ANSI/TIA-942 is allowing standby generators. Another example, UTI is more stringent on ambient conditions as they look at the most extreme condition considering a 20-year history. These two facts alone have have many engineers (and business owners) baffled as it adds greatly to the cost and one often wonders why go to these extremes as ultimately it adds to the cost. ANSI/TIA-942 is in that sense more practical yet allows you to go to these extremes if you wish to do so and are willing to pay the incremental cost. This will give the business an option to choose a well-balance risk vs. investment model.


Argument-2: ANSI/TIA-942 is prescriptive and has many strict requirements in the Table which are hard to implement

This argument is baseless and is aimed at those who do not understand how audits really work. As indicated in one of my previous articles, “Outcome based or checklist or can it be both?” we made it clear that the table is supporting the overarching requirement for each rating level. So, if something is not meeting the exact description of the table it does not mean that you don’t meet the requirement of the standards.  Read the article I wrote about this subject here: “Outcome based or checklist or can it be both?”


Argument-3: If you fail to meet one of the ANSI/TIA-942 requirements then you cannot get certified

This argument pretty much follows the same “logic” of the previous statement. There is NO such truth as not being able to get certified if you miss out on meeting a particular description of the table. Furthermore, in auditing based on ISO, there are Cat-1 and Cat-2 non-conformities. I will explain the difference in a future article but for now it will be sufficient to say that if a site has one (or multiple) Cat-2 non-conformities then that does not automatically mean that a site cannot be certified.


The conclusion is that UTI:TST is being portrayed to be easier based on a narrow scope but that leaves business owners at risk for having an incomplete true assessment of all important factors that make up a reliable data centre infrastructure. If you compare the same scope for UTI:TST vs. ANSI/TIA-942 then both have the same overarching goal such as concurrently maintainability and fault tolerance etc.  whereby in fact UTI can turn out to be more costly due to some requirements which some data centre operator owners consider to be overkill.

In our next article, I will address the usage of the term Tier and Rating. Stay tuned.


Top 10 data center operating procedures

Every data center needs to define its policies, procedures, and operational processes.

An ideal set of documentation goes beyond technical details about application configuration and notification matrices.

These top 10 areas should be part of your data center’s standard operating procedures manuals.

    1. Change control. In addition to defining the formal change control process, include a roster of change control board members and forms for change control requests, plans and logs.
    2. FacilitiesInjury prevention program information is a good idea, as well as documentation regarding power and cooling emergency shut offprocesses; fire suppression system information; unsafe condition reporting forms; new employee safety training information, logs and attendance records; illness or injury reporting forms; and visitor policies.
    3. Human resources. Include policies regarding technology training, as well as acceptable use policies, working hours and shift schedules, workplace violence policies, employee emergency contact update forms, vacation schedules, and anti-harassment and discrimination policies.
    4. Security. This is a critical area for most organizations. Getting all staff access to the security policies of your organization is half the battle. An IT organization should implement policies regarding third-party or customer system access, security violations, auditing, classification of sensitive resources, confidentiality, physical security, passwords, information control, encryption and system access controls.
    5. Templates. Providing templates for regularly used documentation types makes it easier to accurately capture the data you need in a format familiar to your staff. Templates to consider include policies, processes, logs, user guides and test/report forms.
    6. Crisis management. Having a crisis response scripted out in advance goes a long way toward reducing the stress of a bad situation. Consider including crisis management documentation around definitions; a roster of crisis response team members; crisis planning; an escalation and notification matrix; a crisis checklist; guidelines for communications; situation update forms, policies, and processes; and post-mortem processes and policies.
    7. Deployment. Repeatable processes are the key to speedy and successful workload deployments. Provide your staff with activation checklists, installation procedures, deployment plans, location of server baseline loads or images, revision history of past loads or images and activation testing processes.
    8. Materials management. Controlling your inventory of IT equipmentpays off. Consider including these items in your organization’s documentation library: policies governing requesting, ordering, receiving and use of equipment for testing; procedures for handling, storing, inventorying, and securing hardware and software; and forms for requesting and borrowing hardware for testing.
    9. Internal communications. Interactions with other divisions and departments within your organization may be straightforward, but it is almost always helpful to provide a contact list of all employees in each department, with their work phone numbers and e-mail addresses. Keep a list of services and functions provided by each department, and scenarios in which it may be necessary to contact these other departments for assistance.
    10. Engineering standardsTesting, reviewing and implementing new technology in the data center is important for every organization. Consider adding these items to your organization’s standard operating procedures manuals: new technology request forms, technology evaluation forms and reports, descriptions of standards, testing processes, standards review and change processes, and test equipment policies.

About the author
Kackie Cohen is a Silicon Valley-based consultant providing data center planning and operations management to government and private sector clients. Kackie is the author of Windows 2000 Routing and Remote Access Service and co-author of Windows XP Networking.

source from:

Data Center Migrations & Consolidations

Business Challenge:

A major academic based health system in Philadelphia required migration services consisting of relocating nearly 600 servers and related technology equipment from the primary data center to the new “designated data center”. ComSource needed to develop a data center migration plan which allowed the move to occur in phases, enabling the IT staff to concentrate on one critical factor at a time and minimize the danger of excessive downtime at any point during the move.

The Solution:

ComSource, in conjunction with our migration partner, worked closely with the client’s project management office, attending pre-move meetings and planning sessions to develop a “playbook” based on a selected “move event” approach and timeline. Key to the plan’s success was our move methodology which was based on the applications priorities and dependencies. Once these “logical” dependencies were determined, a hardware or physical dependency check was performed. This helped put the servers into various groups and identify which ones needed an asset swap, parallel, forklift, etc. type of approach. The data migration took place during 13 move events utilizing one truck per move. ComSource also provided relocation of the customer’s IT equipment, including packing and crating, loading, transporting, unloading and uncrating. As part of this data center migration move it was imperative that customer manufacturer hardware agreements and warranty coverage remained valid throughout the relocation event. ComSource provided consulting services which included suggesting best practices for relocating, inventorying, communicating, change management and other measures associated with the data center migration.


ComSource successfully completed a nearly 600 server and technology infrastructure move over a 13 weekend period on schedule, working within the customer’s timeline and budget and meeting minimal downtime to the ongoing business operations.

Business Continuity, Recovery Services & Co-Location

Business Challenge:

A rapidly growing U.S. based retail corporation maintained a production data center in New York State. While they had always cut daily incremental backup tapes and weekly “fulls”…and in turn sent them offsite to a secure location, they never had a contracted warm or hot site facility from which to recover their critical applications in the event of a disaster at the main production facility. Several years ago the lack of a contract at a hot site recovery facility never seemed a major issue for a small retail company…just more of a potential minor inconvenience. As the company grew, in fact doubled and tripled in size, it became all the more apparent and actually critical to come up with a more effective and comprehensive business recovery plan.

The Solution:

The ComSource sales and support team went to work immediately. First and foremost, ComSource and their business recovery team experts worked with the company to examine all workload and applications with the intent of prioritizing which applications absolutely needed to be up and running in hours versus days in the event of a “disaster”. The company’s key applications were hosted on both IBM’s Power family with OS/400 applications as well as several applications running on Dell x86 Servers. Once ComSource and their recovery expert team collectively completed a full audit of all hardware platforms, all critical and non-critical applications and all current backup and recovery infrastructure they jointly selected one of ComSource’s elite Recovery Site locations in northern Georgia for the hot site facility. In this case, the ComSource long time affiliate, a true “Best in Class” Disaster Recovery Services organization, provided the customer with the best overall top to bottom recovery option with a secure facility, redundant components, extensive equipment inventory and a staff expertise across all of the end user platforms..


Dedicated platforms were selected and deployed processes were implemented to insure that in the event of a disaster at the production facility a rapidly growing retail organization could recover its mission critical applications quickly and efficiently. This long time valued ComSource customer has continued to utilize this premier Disaster Recovery organization and has performed many complete recovery tests over several years. The end user’s executive team can now “sleep at night” knowing that in the event of a disaster…most any disaster…the company can bring up and run all selected applications in a timely fashion with a highly skilled support team working closely with them along the recovery process.

Information Technology in the Healthcare Sector

Business Challenge:

A 528-bed tertiary care facility in western New York needed to successfully implement an EMR solution. ComSource, along with our partner affiliate, competed with top IT healthcare solution providers and consultants to win this major project that required significant pre-implementation planning, management and support to help deploy the mission critical EPIC software.

The Solution:

Due to timeline sensitivity and federal mandates, ComSource and our partner affiliate were tasked to successfully implement the EPIC software by providing the key services listed below:

  • Planning and implementation pre-planning
  • System’s analysis
  • Change management
  • Screen/report design
  • Tailoring/configuration
  • Integration testing
  • Training
  • Activation planning
  • Post implementation review


ComSource was able to assist this tertiary care facility in achieving their targeted deadlines and obtaining full funding of the project. The facility realized significant cost savings by choosing our ComSource partner affiliate over other alternative IT healthcare systems integrators. This successful EPIC implementation helped the client attain meaningful use objectives in a cost effective manner. In addition, the doctors and hospitals were able to report required quality measures that demonstrate outcomes, such as:

  • Improved process efficiencies
  • Maximized use of human resources
  • Improved “return on investment” on the technology purchase
  • Employee satisfaction
  • Physician satisfaction
  • Improved clinical quality outcomes
  • Increased case flow
  • Improved profitability
  • Improved patient care and safety

Mobile Technology and Logistical Solutions

Business Challenge:

A leading freight and logistics provider needed to reduce their use of paper through the full delivery cycle, improve their customer’s view time for payment on deliveries online and increase efficiency among drivers, IT support staff and employees completing back-office procedures. The “partial” paper based system being used by this company was creating inefficiencies such as, data loss, lack of quality control and wasted driver time.

The Solution:

ComSource and our partner affiliate coordinated with all levels of the corporate structure to create a new solution. This interactive process allowed the employees to see how the new processes directly affected their jobs and incorporated their requested features in the new system. A complete mobility solution was implemented to allow the company to automate their entire delivery and collection process in real time. Key elements of this solution include:

  • Drivers were able to scan items both within and outside the depot.
  • Consignments were manifested electronically.
  • “Sign-on glass” allowed the company to collect proof of delivery as well as accept and complete pickups in the field.
  • Information was instantly transferred to back office systems which increased functionality for staff in regard to schedules,
    deliveries, collections and depot operations.
  • Handheld remote mobile hardware and software assets allow support staff to access the device to assist the courier
    as needed. If a device is stolen it can be wiped of any sensitive customer information or corporate data remotely.


This provider benefited from the new mobility solution in the following ways:

  • Significant cost savings through ongoing maintenance, processing infrastructure, “rate of return” and equipment repairs.
  • Improved speed and efficiencies receiving deliveries, creating invoices and meeting the increasing demands of customers.

Information Technology Assessments

Business Challenge:

A nationally recognized retail corporation selected ComSource as a “checks and balances” to evaluate the performance of their current network and propose an architectural strategy that was both redundant and secure while requiring less maintenance. This company had a fast growing retail business and needed to ensure that their environment could support their current rate of growth.

The Solution:

ComSource assessed the current network design with an onsite CCIE engineer and an array of tools. The network design assessed was:

  • IP Addressing Strategy
  • VLAN Strategy
  • Access Layer Switching Strategy
  • Distribution Layer Switching Strategy
  • Core Layer Switching Strategy
  • Wide Area Network Strategy
  • Internet Access Strategy

The infrastructure assessed was:

  • Cabling Infrastructure Strategy
  • System Security Strategy
  • Production Network Management Strategy

These assessments lead to recommendations from our CCIE engineer, to include:

  • Compressing large image files instead of just adding bandwidth.
  • MPLS for larger sites.
  • The utilization of QOS when used with VPNs.
  • Manual routing IDs were established in OSPF using loopbacks for stability.
  • Increasing MTU size on remotes to cut down on fragmentation in TCP.
  • Filtering with a dedicated firewall.
  • Eliminating single points of failure and simplify cabling by collapsing all switches within the datacenter, excluding top
    of rack switches, to 2 Core switches.
  • Network management solution to take configuration backups of all devices at regular intervals and push out
    mass configuration changes.


At the end of the assessment the customer had a clear road map as to how their network should continue to grow effectively in concert with their rapidly growing business enterprise. Strategic implementation of the recommended solutions increased throughput, functionality and security in conjunction with the expanding company.

3rd Party Maintenance and Support, Non OEM

Business Challenge:

A Fortune 1500 privately held cosmetic company was tasked by senior management executives to reduce costs in their data center. Knowing that IT maintenance contracts are subject to frequent annual price increases, often associated with renewals, this company reached out to ComSource for strategies on maintenance cost reduction.

The Solution:

ComSource, along with our trusted and recognized 3rd party maintenance service provider, looked at 2 corporate datacenter locations for this cosmetic company that had expiring IBM and Dell maintenance contracts and were able to help the company save over 40% on support in the first twelve months. Due to the cost savings from just one year of using 3rdparty maintenance with ComSource, this company expanded their portfolio and not only renewed the contracts for the same IBM and Dell equipment, but also added additional IBM, Dell and Brocade equipment to the existing contracts. The service levels provided to this cosmetic company were: a 3rd party maintenance coordinator to track expiration dates and adds/deletes, 7x24x365 hardware maintenance, local service depots, call-home, online portal for asset management and incident tracking. This online portal allows our customers to see contracts in place with our 3rd party maintenance provider across all platforms and gives the customer the ability to upload maintenance contracts that are held with other maintenance providers as well.


ComSource and our 3rd party maintenance provider allow our customers to show a cost savings across multiple platforms and all major manufacturers. Where a typical OEM increases maintenance costs, we are able to decrease (or maintain at a lower price point) the costs as the equipment ages. We work with our customers to keep the equipment on the floor instead of trying to “end of life” the equipment as so many OEM’s tend to do. In this specific case utilizing our 3rd party maintenance solution, this cosmetic company saved approximately 40% on their maintenance contract costs across their expanded IT portfolio.

 Source Link:

Tier 3 data center specifications checklist

This section of our two part series on tier 3 data center specifications deals with the power supply aspects.

As the most critical part of business, an organization needs to ensure 100% availability for its data center. This is why building a data center according to tier 3 data center specifications ensures a certain assured level of availability or uptime.

A data center built according to tier 3 data center specifications should satisfy two key requirements: redundancy and concurrent maintainability. It requires at least n+1 redundancy as well as concurrent maintainability for all power and cooling components and distribution systems. A component’s lack of availability due to failure (or maintenance) should not affect the infrastructure’s normal functioning.

These specifications have to be met only from the power, cooling and building infrastructure fronts till the server rack level. Tier 3 data center specifications do not specify requirements at the IT architecture levels. By leveraging the following steps, your data center’s power supply infrastructure can meet the tier 3 data center specifications.

Stage 1: Power supply from utility service provider

The Uptime Institute regards electricity from utility service providers as an unreliable source of power. Therefore, tier 3 data center specifications require that the data center should have diesel generators as a backup for the utility power supply.

An automatic transfer switch (ATS) automatically switches over to the backup generator if the utility power supply goes down. While many organizations have just a single ATS connecting a backup generator and power supply from the utility service provider, the tier 3 data center specifications mandate two ATSs connected in parallel to ensure redundancy and concurrent maintainability. The specifications however, don’t call for the two ATSs to be powered by different utility service providers.

Stage 2: Backup generators

Tier 3 data center specifications require the diesel generators to have a minimum of 12 hours of fuel supply as reserves. Redundancy can be achieved by having two tanks, each with 12 hours of fuel. In this case, concurrent maintainability can be ensured using two or more fuel pipes for the tanks. Fuel pipes can then be maintained without affecting flow of fuel to the generators.

Stage 3: Power distribution Panel

The power distribution panel distributes power to the IT load (such as servers and networks) via the UPS. It also provides power for non IT loads (air conditioning and other infrastructure systems).

Redundancy and concurrent availability can be achieved using separate power distribution panels for each ATS. This is because connecting two ATSs to a panel will necessitate bringing down both ATS units during panel maintenance or replacement. However, the tier 3 data center specifications require two or more power lines between each ATS and power distribution panel to ensure redundancy and concurrent maintainability. Similarly, each power distribution panel and UPS should also have two or more lines for the same purpose.

Stage 4: UPS

Power from the distribution panel is used by the UPS and supplied to the power distribution boxes for server racks as well as network infrastructure. For example, if a 20 KVA UPS is required for a data center, redundancy can be achieved by deploying two 20 KVA UPS or four 7 KVA UPS units. Redundancy can even be achieved with five 5 KVA UPS units.

The tier 3 data center specifications require that each UPS be connected to just a single distribution box for redundancy and concurrent maintainability. This ensures that only a single power distribution circuit goes down, in case of a UPS failure or maintenance.

Stage 5: Server racks

Each server rack must have two power distribution boxes in order to conform to tier 3 data center specifications. The servers in each rack should have dual power supply features so that they can connect to the power distribution boxes.

A static switch can be used for devices which lack dual power mode features. This switch takes in supply from both power distribution boxes and gives a single output. The static switch can transfer from a power distribution box to another in case of failures, within a few milliseconds.

About the author: Mahalingam Ramasamy is the managing director of 4T technology consulting, a company specializing in data center design, implementation and certification. He is an accredited tier designer (ATD) from The Uptime Institute, USA and the first one from India to get this certification.

Redundancy: N+1, N+2 vs. 2N vs. 2N+1

A typical definition of redundancy in relation to engineering is:  “the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe.”   When it comes to datacenters,  the need for redundancy focuses on how much extra or spare power the data center can offer its customers as a back up during a power outage.  Unexpected power outages are the overwhelming usual cause for datacenter downtime.*


Photo Courtesy of the Ponemon Institute

Photo Courtesy of the Ponemon Institute

According to the industry-leading Ponemon Institute’s 2013 Study on Data Center Outages (or “downtime” – a four-letter word in the data center industry) that surveyed 584 individuals in U.S. organizations who have responsibility for data center operations in some capacity, from the “rank and file” to C-Level, 85% participants report their organizations experienced a loss of primary utility power in the past 24 months.  And of that 85% – 91% reported their organizations had an unplanned outage.  That means that most data centers experienced downtime in the last 24 months.  During these outages respondents averaged two complete data center shutdowns, with an average downtime of 91 minutes per failure.

The entire study also speaks of the implementation and the impact of DCIM (Data Center Infrastructure Management) – and how it was used to fix or correct the root cause of the outages.*

The most common are due to the weather, but they can also occur from simple equipment failure or even an accidental cutting of a power line due to a backhoe.  No matter what the reasons are, an unplanned outage can cost a company a lot of money, especially if there revenues are dependent upon internet sales.

For example, if you’re Amazon and you go down, you lose a mind-blowing amount of money: an estimated $1,104 in sales for every second of downtime. The “average” U.S. data center loses $138,000 for one hour of data center downtime per year.  So, if one compares Ponemon’s 91-minute average downtime per year – that’s an approximate loss of $207,000 for each organization accessing the data center.

What’s this all mean?  Downtown matters, and downtime prevention matters, so Redundancy matters.

Perferably, large businesses and corporations  have their servers set up at either Tier 3 or Tier4 data centers because they offer a sufficient amount of redundancy in case of a unforeseen power outage. With this in mind, not all data center’s redundancy power systems are created equal.  Some offer N+1, 2N, and 2N +1 redundancy systems.

What’s the Difference Between N+1, 2N and 2N+1?

The simple way to look at N+1 is to think of it in terms of throwing a birthday party for your child or yourself, because who doesn’t love cupcakes?.  Say you have ten guests and need ten cupcakes, but just in case you have that  “unexpected” guest show up, you order eleven cupcakes.  “N” represents the exact amount of cupcakes you need, and the extra cupcake represents the +1.  Therefore you have N+1 cupcakes for the party.  In the world of datacenters, an N+1system, also called parallel redundancy, and is a safeguard to ensure that an uninterruptible power supply (UPS) system is always available. N+1 stands for the number of UPS modules that are required to handle an adequate supply of power for essential connected systems, plus one more, so 11 cupcakes for 10 people, and less chance of downtime.

Although an N+1 system contains redundant equipment, it is not, however, a fully redundant system and can still fail because the system is run on  common circuitry or feeds at one or more points rather than two completely separate feeds.

Back at the birthday party!  If you plan a birthday party with a 2N redundancy system in place, then you would have the ten cupcakes you need for the ten guests, plus an additional ten cupcakes, so 20 cupcakes.  2N is simply two times, or double the amount of cupcakes you need.   At a data center, a 2N system contains double the amount of equipment needed that run separately with no single points of failure.  These 2N systems are far more reliable than an N+1 system because they offer a fully redundant system that can be easily maintained on a regular basis without losing any power to subsequent systems.  In the event of an extended power outage, a 2N system will still keep things up and running.  Some data centers offer 2N+1, which is actually double the amount needed plus an extra piece of equipment as well, so back at the party you’ll have 21 cupcakes, 2 per guest and 3 for you!

 For more information on Redundancy, N+1, 2N, 2N+1, and the difference between them, as well as, the different Tier levels offered by datacenters around the world visit or call us at (877) 406-2248.

*Sources: Ponemon Institute 2013 Study on Data Center Outages: Sponsored by Emerson Network Power – This link will take you to the entire study, which is also an interesting read about the how data center employees view their data center’s structure and superiors.

Survey: UPS Issues Are Top Cause of Outages

This chart shows the perception gap between the executive suite and data center staff on key issues (click for larger version).

Problems with UPS equipment and configuration are the most frequently cited cause of data center outages, according to a survey of more than 450 data center professionals. The survey by the Ponemon Institute, which was sponsored by Emerson Network Power, also highlights a disconnect between data center staff and the executive suite about uptime readiness.

The National Survey on Data Center Outages surveyed 453 individuals in U.S. organizations who have responsibility for data center operations, who were asked about the frequency and root causes of unplanned data center outages, as well as corporate efforts to avert downtime. Ninety five percent of participants reported an unplanned data center outage in the past two years, with most citing inadequate practices and investments as factors in the downtime.

Here are the most frequently cited causes for downtime:

  • UPS battery failure (65 percent)
  • Exceeding UPS capacity (53 percent)
  • Accidental emergency power off (EPO)/human error (51 percent)
  • UPS equipment failure (49 percent)

There were signs that the ongoing focus on cost containment was being felt in the data center. Fifty nine percent of respondents agreed with the statement that “the risk of an unplanned outage increased as a result of cost constraints inside our data center.”

“As computing demands and energy costs continue to rise amidst shrinking IT budgets, companies are seeking tactics – like cutting energy consumption – to cut costs inside the data center,” said Peter Panfil, vice president and general manager, Emerson Network Power’s AC Power business in North America. “This has led to an increased risk of unplanned downtime, with companies not fully realizing the impact these outages have on their operations.”

Perception Gap
The focus on UPS issues isn’t unexpected, given the role of uninterruptible power supplies in data center power infrastructure. It’s also consistent with Emerson’s position as a leading vendor of UPS equipment. But the survey byPonemon, which is known for its surveys on security and privacy, also points to a perception gap between senior-level and rank-and-file respondents regarding data center outages.

Sixty percent of senior-level respondents feel senior management fully supports efforts to prevent and manage unplanned outages, compared to just 40 percent of supervisor-level employees and below. Senior-level and rank-and-file respondents also disagreed regarding how frequently their facilities experienced downtime, with 56 percent of the senior executives believing unplanned outages are infrequent, while just 45 percent of rank-and-file respondents agreed to the same statement.

“When you consider that downtime can potentially cost data centers thousands of dollars per minute, our survey shows a serious disconnect between senior-level employees and those in the data center trenches,” said Larry Ponemon, Ph.D., chairman and founder of the Ponemon Institute. “This sets up a challenge for data center management to justify to senior leadership the need to implement data center systems and best practices that increase availability and ensure the functioning of mission-critical applications. It’s imperative that these two groups be on the same page in terms of the severity of the problem and potential solutions.”



Data Center Generators

Generators are a key to data center reliability. Supplementing a battery-based uninterruptible power supply (UPS) with an emergency generator should be considered by all data center operators. The question has become increasing important as super storms such as Hurricane Sandy in the Northeast United States knocked out utility power stations and caused many downed power lines, resulting in days and weeks of utility power loss.

Data Center Generator Delivery

Beyond disaster protection, the role of a backup generator to provide power is important when utility providers consider summer rolling blackouts and brownouts and data center operators see reduced utility service reliability. In a rolling blackout, power to industrial facilities is often shut down first. New data center managers should check the utilities contract to see if a data center is subject to such utility disconnects.

Studies show generators played a role in between 45 and 65 percent of outages in data centers with an N+1 configuration (with one spare backup generator). According to Steve Fairfax, President of MTechnology, “Generators are the most critical systems in the data center.” Mr. Fairfax was the keynote speaker at the 2011 7×24 Exchange Fall Conference in Phoenix, Arizona.

What Should You Consider Before Generator Deployment?

  • MTU-Onsite-Energy-Data-Center-Gas-Generators
    MTU Onsite Energy Gas Generator

    Generator Classification / Type. A data center design engineer and the client should determine if the generator will be classified as an Optional Standby power source for the data center, a Code Required Standby power source for the data center, or an Emergency back-up generator that also provides standby power to the data center.

  • Generator Size. When sizing a generator it is critical to consider the total current IT power load as well as expected growth of that IT load. Consideration must also be made for facility supporting infrastructure (i.e. UPS load) requirements. The generator should be sized by an engineer, and specialized sizing software should be utilized.
  • Fuel Type. The most common types of generators are diesel and gas. There are pros and cons to both as diesel fuel deliveries can become an issue during a natural disaster and gas line feeds can be impacted by natural disasters. Making the right choice for your data center generator depends on several factors. The fuel type needs to be determined based upon local environmental issues, (i.e. Long Island primarily uses natural gas to protect the water aquifer under the island), availability, and the required size of the standby/emergency generator.
  • Deployment Location. Where will the generator be installed? Is it an interior installation or an exterior installation? An exterior installation requires the addition of an enclosure. The enclosure may be just a weather-proof type, or local building codes may require a sound attenuated enclosure. An interior installation will usually require some form of vibration isolation and sound attenuation between the generator and the building structure.
  • Cummins-Lean-Burn-Industrial-Gas-Generators
    Cummins Lean-Burn Gas Generator

    Exhaust and Emissions Requirements. Today, most generator installations must meet the new Tier 4 exhaust emissions standards. This may depend upon the location of the installation (i.e. city, suburban, or out in the country).

  • Required Run-time. The run-time for the generator system needs to be determined so the fuel source can be sized (i.e. the volume of diesel or the natural gas delivery capacity to satisfy run time requirements).


What Should You Consider During Generator Deployment?

  • Commissioning The commissioning of the generator system is basically the load testing of the installation plus the documentation trail for the selection of the equipment, the shop drawing approval process, the shipping documentation, receiving and rigging the equipment into place. This process also should include the construction documents for the installation project.
    Generac Generator



  • Load Testing Typically, a generator system is required to run at full load for at least four (4) hours. It will also be required to demonstrate that it can handle step load changes from 25% of its rated kilowatt capacity to 100% of its rated kilowatt capacity. If the load test can be performed with a non-linear load bank that has a power factor that matches the specification of the generator(s) that is the best way to load test. Typically, a non-linear load bank with a power factor between 75% and 85% is utilized.
  • Servicing The generator(s) should be serviced after the load test and commissioning is completed, prior to release for use.


What Should You Consider After Generator Deployment?

  • Caterpillar Industrial Diesel GeneratorsService Agreement. The generator owner should have a service agreement with the local generator manufacturer’s representative.
  • Preventative Maintenance. Preventative Maintenance should be performed at least twice a year. Most generator owners who envision their generator installation as being critical to their business execute a quarterly maintenance program.
  • Monitoring. A building monitoring system should be employed to provide immediate alerts if the generator and ATS systems suffer a failure, or become active because the normal power source has failed. The normal power source is typically from the electric utility company, but it could be an internal feeder breaker inside the facility that has opened and caused an ATS to start the generator(s) in an effort to provide standby power.
  • Regular Testing. The generator should be tested weekly for proper starting, and it should be load tested monthly or quarterly to determine that it will carry the critical load plus the required standby load and any emergency loads that it is intended to support.
  • bloom-energy-server
    The Bloom Box by Bloom Energy

    Maintenance. The generator manufacturer or third party maintenance organization will notify the generator owner when important maintenance milestones are reached such as minor rebuilds and major overhauls. The run hours generally determine when these milestones are reached, but other factors related to the operational characteristics of the generator(s) also apply to determining what needs to be done and when it needs to be done.

PTS Data Center Solutions provides generator sets for power ratings from 150 kW to 2 MW. We can develop the necessary calculations to properly size your requirement and help you with generator selection, procurement, site preparation, rigging, commissioning, and regular maintenance of your generator.

To learn more about PTS recommended data center generators, contact us or visit (in alphabetical order):

To learn more about PTS Data Center Solutions available to support your Data Center Electrical Equipment & Systems needs, contact us or visit:

Link Source: solutions/electricalequipmentandsystems/data-center-generators/