Current issues in data centre management
In the past, managing a data centre was not as complicated as it is nowadays. As data grows larger and the demand for processing power increases, data centres have become more demanding and complicated. Onyshkevych (2013) quotes “Complicating the situation, operational decisions at the data centre now include such factors as power, cooling, rack space and CPU availability. This is in addition to other information gleaned from IT systems, and related to the facility infrastructure components such as UPSs, PDUs chillers, HVACs generators, branch circuits, etc.” The main issues are reviewed below:
1 Energy Efficiency
The more data centres grow the more power they will consume, an interesting study by Koomey (2011) shows that the data centre consumption of power has increased by 36% from 2005 to 2010 resulting in between 1.1% and 1.5% of global power consumption, yet only 1 out of 5 data centres is highly efficient. Ruth (2009) also states that more than half of all IT power consumption is used for data centres. Data centres are excessive consumers of energy. According to Nielsen & Bouley (2012) “historically, data centres design and operations have been focused on reliability and capacity. This has led to the unfortunate situation where data centres have not been optimised for efficiency.” Energy costs keep getting higher every day and studies have shown that in some cases they exceed the cost of the IT hardware itself (U.S. Department of Energy 2011). As a result organisations today are faced with the challenge of reducing the energy consumption of their data centres while keeping the performance to optimum levels.
2 Performance management
Managing infrastructure becomes more and more difficult due to the ongoing increase in interdependency and complexity between infrastructure, applications and the functions required to deliver service. Adding to this issue, while organisational demands and expectations from data centres continue to grow, budgets remain flat and staffing static (Ptak 2008). Therefore data centre managers after allocating a lot of resources in order to predict those risks they still quite often fail to do so.
3 Capacity Planning
Many data centre managers today, are unable to determine if their data centres are operating at full capacity. Onyshkevych (2013) quotes “Traditionally, operators have left plenty of room for error so uptime isn’t interrupted- a strategy known as capacity safety gap”, or “over-provisioning”. Therefore often data centre administrators find themselves in a position where they are reactive instead of proactive in terms of new infrastructure purchases and resource allocation due to poor planning and the lack of information. The result of such practices is a great increase of costs on unused equipment as well as power consumption.
4 System Monitoring
When it comes to data centres, availability is a serious matter, therefore efficient system monitoring is a big issue. Data centres face many day-to-day failures and the need for proactive and rapid responses to potential uptime threats is great. AccelOps (2013) mentions that “To meet today’s business demand for greater IT efficiency and responsiveness, IT organizations must be able to see and manage all aspects of security, performance, availability and change in their entire IT infrastructure”. Furthermore data centre staff are often found to spend unnecessary time inside the data centre either trying to find where an error has occurred or locate where a server is physically placed.
Data centre Infrastructure Management (DCIM)
As data centres grow larger and are transformed from the single hardware architecture to a highly integrated and modularised one the need for management systems grows. Wei (2013) quotes “In order to save the manpower management costs by improving system management software efficiency, enterprises in recent years have been actively introducing all kinds of automated management software and system monitoring software”. Following, some of the key features of such systems will be presented:
1 Asset management and analysis
The importance of asset management and analysis is high as it aids data centre administrators to identify future needs in terms of infrastructure as well as to quickly analyse and allocate new equipment in order to optimise operational configuration (Schirmacher 2013). Additionally they offer great transparency of the data centre by providing information with regards to the physical locations of the infrastructure, servers, switches etc.
2 Environmental monitoring
Changes within the data centre environment are crucial and need to be monitored at all times. Changes that are related to factors such as temperature, safety or disaster prevention may have a great impact on the data centre and its operation. Wei (2013) states that “with the trend of unmanned management, the functions of monitoring environmental changes of data centres in real time and providing immediate feedback to control system for automatic response to emergency would become very important”. For example, if a rise in temperature or an electric failure is detected the function will report back to the system which will take immediate action by lowering the temperature or cutting off the power.
3 Energy management
The energy monitoring function of a DCIM proves useful as it not only provides with power usage monitoring and analysis it also includes the control of energy usage effectiveness such as a PUE (Power Usage effectiveness) reporting function (Wei 2013). Rouse (2009) states that “PUE is determined by dividing the amount of power entering a data centre by the power used to run the computer infrastructure within it. PUE is therefore expressed as a ratio, with overall efficiency improving as the quotient decreases toward 1”.
4 Remote Monitoring and Control
Remote control allows data centre staff to effectively oversee the data centre operations remotely without physically having to be there. Not only will that increase efficiency by saving time, but according to Colocation America (2013), it will also improve thermal efficiency as on average, data centre staff will generate over 350 BTU’s (British Thermal Unit) per hour which might result in slight increases in temperature in different areas of the data centre. Data centre staff will be able to generate detailed system reports to monitor the state of the data centre and detect any errors. Finally the system will monitor the data centre with real-time alerts in case of failures. These alerts will also include the physical location of where the error has occurred so data centre staff can quickly tend to the problem.
This is just a snapshot of a great report done by Christoforos Karagiannis while completing his final year project at the University. It’s been a pleasure to guide him in his efforts and exploration of DCIM.