Pittsburgh Supercomputing Center Deploys Disk-Based Repository
Data Supercell, with major advantages over traditional tape-based archiving for large-scale datasets.
This is a Press Release edited by StorageNewsletter.com on September 5, 2012 at 2:58 pmThe Pittsburgh Supercomputing Center
(PSC) has developed and deployed a disk-based file
repository and data-management system called the Data Supercell.
The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon
University and the University of Pittsburgh together with Westinghouse Electric
Company, LLC.
Established in 1986, PSC is supported by several federal agencies, the
Commonwealth of Pennsylvania and private industry, and is a service provider in
the National Science Foundation XSEDE program, (the Extreme Science and
Engineering Discovery Environment.)
This innovative technology, developed by a PSC team of scientists, provides
major advantages over traditional tape-based archiving for large-scale datasets.
The PSC team exploited increasing cost-effectiveness of commodity
disk technologies, and adapted sophisticated PSC-developed file system software
(called SLASH2) to create a new class of integrated storage services. A patent
application is under review.
The Data Supercell is intended to serve users of large scientific
datasets, including users of XSEDE, the National Science Foundation
cyberinfrastructure program, the world’s largest collection of integrated digital resources and services.
"The Data Supercell is a
unique technology, building on the increasing cost-effectiveness of disk
storage and the capabilities of PSC’s SLASH2 file system," said
Michael Levine and Ralph Roskies, PSC co-scientific directors. "It will go far to enable more efficient, flexible
analyses of very large-scale datasets."
Initial capacity of the Data Supercell is 4PB and it is designed
to allow added capacity as needed. In comparison with cumbersome tape-based
archiving, sometimes referred to as ‘write once, read never,’ the
Data Supercell’s disk-based technology facilitates much faster data transfer
(latency 10,000 times better than tape and bandwidth 24 times faster than PSC’s
previous tape archiving system). It also incorporates reliability
and security features for data replication, safety and
movement.
Deployment of the Data Supercell aims to meet expanded data-storage
needs posed by rapid evolution toward ever larger quantities of data stored and
transferred in many kinds of applications – an evolution frequently termed
‘big data’ – including astrophysics, genomics and vast amounts of
Internet data that can be ‘mined’ for commercial purposes.
Various departments at the University of Pittsburgh, Carnegie Mellon
University and Drexel University are using the Data Supercell. Researchers with
large genomic datasets, produced through Galaxy, a web-based platform for
bioinformatics research at Penn State, are currently using 470TB of Data
Supercell storage.
The Data Supercell was developed by this team of PSC scientists:
Paul Nowoczynski, Jared Yanovich, Zhihui Zhang, Jason Sommerfield, J. Ray
Scott, and Michael Levine.