Playground for Data Centers by Rainer W. Kaese, Toshiba

Kaese Toshiba Author of this article, published on July 2020, is Rainer W. Kaese, senior manager business development, storage products division, Toshiba Electronics Europe GmbH.

A playground for data centers

As the IT responsible manager of an enterprise or service provider, if you were to need to support 100 new hires per year for the next 5 years, how much more capacity and performance would you plan for? How would you implement it from the starting point of your existing infrastructure? It’s not a simple and clear calculation, and it will certainly need tuning over the period of its implementation, but a few numbers and a calculator are all that is needed to develop some rough low-end and high-end costs. And you probably have a broad idea of the suppliers you will turn to in order to implement it.

However, what if you were being asked to develop the storage concept for an automotive supplier’s new service? They are planning on rolling out a cloud-connected autonomous driving solution. They have rough estimates for the number of cars per year, and the amount of data, but the difference between best- and worst-case demand would require one of several different data center approaches if the whole thing is to be financially viable and not over provisioned.

Now, what happens when you need to test your back-of-an-envelope approach? Do you have a spare rack, servers, HDDs and network capacity to firm up your numbers? Probably not.

While cloud storage solutions seem to be popping up everywhere, it is not the solution for everyone’s challenge and, at large scales, is not cheap either. For more sensitive data or implementation approaches, it makes sense to keep that data somewhere physically secure, but this requires space, connectivity, and plenty of hardware, so is not cheap either.

Thanks to Global Data Centers EMEA, a division of NTT Ltd., testing ideas and implementing innovative services are easier than ever before.

Ntt

Their Technology Experience Labs provide space, power and protection, and also the freedom to innovate with cloud and storage technology together with the support of a vibrant community. This allows data center managers and service providers to innovate quickly at low cost to trial private or hybrid clouds or review distributed architectures and measure their impact on IT service delivery. All this backed up by a community linked by webinars, boot camps, meetups and hackathons.

The main site is based in Frankfurt, Germany (Frankfurt 1) where 65,000 square meters of space is available for servers with the building designed specifically to be operated as a data center. Fed by 2 separate substations from 2 separate feeds and protected by 2 separate uninterruptable power supplies and diesel generator redundant power systems, the energy supply is secured. Physical security and access control ensure that systems are protected from range of potential attack approaches, backed-up by 24/7 monitoring and systems redundancy. Connectivity of up to 10Gb/s can be provided from a carrier-mix of over 350 suppliers, ranging from local to tier-1. This particular campus is also linked by fiber optic connection to NTT’s data center in Rüsselsheim, Germany (Frankfurt 3), enabling multi-site implementations.

To date, range of use cases have been trialed ranging from hybrid cloud, storage and big data, through DevOps and app management, to high performance and cognitive computing. Even studies into hardware approaches, such as the testing of water-cooling with waste heat recovery, have been undertaken.

Coming back to the examples highlighted earlier, many IT- and data center managers and service providers will be looking to continue up-scaling, or vertical scaling, their existing systems, adding more hardware and drives to boost performance and capacity. If this seems like the correct solution to the challenges faced, then NTT provides an environment in which such approaches can be evaluated.

This was demonstrated with a high-performance storage server targeting high reliability and performance needs that could be used for various iSCSI targets with sizes ranging from 10TB to 40TB. The system, installed in 2017, utilized Supermicro X10 series servers in 2U format, featuring 2 Intel Xeon CPUs and 128GB RAM. This was coupled with a Microsemi ASR8885, 10Gb/s NICs, and a 60-bay dual expander toploader JBOD from Celestica, Inc. The storage was implemented using Toshiba’s MG04SCA40EAl 4TB enterprise HDDs with 12Gb SAS interface, 7,200rpm performance in a 3.5-inch form factor.

The system was built upon the Open-E JovianDSS, a Linux-based storage software using the ZFS file system that can be used to architect storage on iSCSI, FC, NFS, and SMB (CIFS) protocols. Thanks to its Linux basis it has a high level of hardware compatibility and is well suited to virtualized storage environments. While providing high data integrity and protection using data and metadata check-summing and self-healing to detect and correct errors, it can also be configured as part of an active-active dual-controller cluster.

The final solution delivered 108TB of user storage with a Zpool capacity efficiency of 50% using a 2-way mirror storage redundancy type. The Zpool was laid out in 30 groups of data/parity disk pairs, providing 240TB of gross unformatted capacity, and 120TB net. The final usable storage of 108TB provided a Zpool read performance rating of 12.9x single disk, with Zpool write performance attaining 8.5x single disk. Save for a single planned shutdown for a software update, the system has been operational without downtime or disk failure since August 2017.

As well as mirror groups, dual and triple parity groups are also supported. The triple parity approach is recommended with high capacity disks of 10TB or larger. At this parity level the malfunction of 3 disks per data group can be tolerated. Additionally, the Open-E JovianDSS supports a storage self-backup approach with versioning possible up to every minute. The backup feature creates rotational auto-snapshots of a volume according to user-defined retention-interval plans, asynchronously replicating snapshot deltas to local (on-site) or remote (off-site) storage.

The backup application is very light, which explains why it can run 24/7 without an influence on production. It can be used for regular backup/restore purpose or for instant DR.

Beyond being a scale-up storage PoC activity, this cluster also serves as the virtual storage infrastructure of the Technology Experience Lab for all other innovators. This makes it a platform to prove its functional and performance advantages daily under realistic operating conditions.

By adding more HDDs in JBOD enclosures, this cluster has the potential to scale-up further as storage needs increase. The upper limit of this scaling is determined by the available rack space, reach via SAS cabling, the number of SAS ports on the controller(s) and the compute power of the controlling server. Scaling up this particular JovianDSS cluster in the Technology Experience Lab would probably be possible up to the low, single-digit petabyte range but, due to the mentioned limitations, not beyond this capacity.

Of course, if your future growth and user base is unknown, as in the automotive example given, a different approach is required that scales as needed and has no real upper limit on capacity. This requires horizontal scalability, also known as scale-out storage.

Scale-out storage uses a network-connectivity-based approach that allows new nodes to be added: almost without limit, to the existing clusters as need dictates. Each cluster consists of a number of servers, drives and networking, with the cluster nodes linked together by high-speed networking or backplane. Thanks to this approach, there is no need for over provisioning as capacity and performance can be added as required. The entire storage solution appears, to users, as a single entity and can be administered via a single interface, even when parts of the hardware implementation are located in other cities or on other continents.

To ensure that this collection of hardware both operates as a single storage entity, and provides provision for expansion, failover, and robustness, suitable software will also need to be selected. NTT’s approach is agai for testing combinations of hardware and software to learn how they interact. This could be to determine what works best for individual needs, such as performing a partial or full system recovery after ransomware attack in a realistic operational environment.
One such system utilizes PetaSAN, a scale-out SAN solution built on Cephhttps://ceph.io. Ceph is an open source storage platform that aims to offer object-, block- and file-level storage using a distributed computing cluster.

Additionally, it offers scalable storage capacity and it replicates the data making it fault-tolerant and both self-healing and self-managing. One of the challenges is that it requires a Linux administrator to setup and maintain, an issue that the open source PetaSAN project aims to solve. This wraps the power and complexity of Ceph in a ‘single pane-of-glass’ management interface while simultaneously providing access to the powerful Linux command line when needed. The goal is to provide highly available clustered iSCSI disks, with each iSCSI disk mapped to all physical disks in the system. This means that in, for example, a clustered hypervisor environment, the concurrent transactions of multiple VMs can be supported without any noticeable performance impact.

The platform was built using hardware from Starline Computer GmbH, consisting of Areca Technology Corp.’s RAID cards in pass-thru mode coupled with Cavium’s remote direct memory accesshttps://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/cavium-product.pdf (RDMA) 10GbE adapters.

The PetaSAN implementation provides 220TB of highly available clustered iSCSI disks thanks to Toshiba’s 10TB enterprise HDDs. The resulting multi-path disks could be identified by virtual IP addresses and offered fast I/O with link redundancy.

Additionally, this system provides the virtual storage resources for the Technology Experience Labs, especially for proof-of-concept activities where its scale-out features are required.

Of course, the final intent may not be to operate and maintain the hardware for a storage solution, or the implementation may need to make use of a mix of own hardware and cloud resources. For such situations the use of Amazon’s Simple Storage Service (S3) may be an option to consider. Thanks to the work of Starline, this test platform integrates an S3 gateway via the command line that should acquire GUI configuration support in the future.

Determining the optimal approach for a data center is challenging, especially with demands on performance and capacity growing continuously but not necessarily by known and plannable amounts. Environments such as that provided by NTT with its professional physical space and community of support, allow room for experimentation using systems of significance. In turn, real-life solutions to daily challenges can not only be assessed but quantified as well under non-system-critical conditions.