All Firms in De-Dupe

In data reduction, first technology was and continues to be lossless compression used at least since 1990 for tape, HDD or LAN/WAN transmission with software or chips to reduce the size of the files. Then de-dupe from 2000.

We don’t study here the algorithms to compress sounds, images and videos.

Data Compression
According to wolframscience.com, modern work on data compression began in the late 1940s with the development of information theory. In 1949 Claude Shannon and Robert Fano devised a systematic way to assign codewords based on probabilities of blocks. An optimal method for doing this was then found by David Huffman in 1951. In the mid-1970s, the idea emerged of dynamically updating codewords for Huffman encoding, based on the actual data encountered. And in the late 1970s, with online storage of text files becoming common, software compression programs began to be developed, almost all based on adaptive Huffman coding.

In 1977 Abraham Lempel and Jacob Ziv suggested the basic idea of pointer-based encoding LZ (Lempel–Ziv). In the mid-1980s, following work by Terry Welch, the so-called LZW (Lempel–Ziv–Welch) algorithm rapidly became the method of choice for most general-purpose compression systems. It was used in programs such as PKZIP, as well as in hardware devices such as modems. Also noteworthy are the LZR (LZ–Renau) methods, which serve as the basis of the standard Zip method.

Among the first companies involved we found in 1990 InfoChip Systems in Santa Clara, CA and Hardware Architecture in Moscow, ID. One of the leaders at that time was Stac Electronics in Carlsbad, CA. There was also some proprietary methods to reduce data on tapes (HP DCLZ for QIC and DAT, IBM IDRC for 3480 cartridges, etc).

With compression, the average is no more than 2X reduction. De-dupe has completely changed the storage world with 10X to 100X ratios depending on the data. Note that de-dupe and compression can be used together.

Who invented de-dupe?
That’s a difficult question. We have never heard about a company claiming to be the first one.

The pioneers seems to be Data Domain, Diligent, Exagrid, FilePool, Permabit, Riverbed and Rocksoft at the beginning of the century.

Data Domain was born in 2001 and conceived a D2D de-dupe appliance. After getting $41 million in financial funding, it raised $111 million following an IPO in 2007 and then was acquired by EMC for a huge $2.2 billion in 2009.

Israeli start-up Diligent, in secondary de-dupe, was acquired by IBM in May 2008 for $200 million.

ExaGrid Systems in Westborough, MA, was born in 2002. Formerly Inspection Systems, it was created by former employees of HighGround Systems and has now 1,200 customers and 4,000 installed systems.

Belgium firm FilePool (formerly Wave Research), co-founded by Paul Carpentier, now CTO of Caringo, was without question the pionner in CAS software. The start-up was taken over in May 2001 for $50 million by EMC to build the Centera, with content-derived addresses that permit only one protected copy of content to be stored no matter how many times it is used. We discovered patent filed by Carpentier and others as early as 1998.

Permabit (Cambridge, MA) was created in 2000 and continues to exist, having OEMs like HDS, LSI, Overland or StoneFly or Violin Memory.

Riverbed was founded in May 2002 in order to design an appliance for WAN optimization.

Born in 2002 in Adelaide, Australia, small start-up Rocksoft in de-dupe software, was bought by ADIC in 2006 for $63 million. Then Quantum got the technology following its acquisition of ADIC. In fact, Quantum did mainly this operation to get a tape activity. But now, it’s a flagship technology for the company that was one of the first power in D2D backup subsystems. Quantum said to have issued 9 U.S. patents on de-dupe and 42 pending ones.

De-Dupe Process
In the de-dupe process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. It may occur in-line, as data is flowing, or post-process after it has been written on disk. The operation can be done on blocks or files, through software or faster through a dedicated hardware appliance.

(Source: Citrix/NetApp)
The basic idea is simple: when you transfer data between two sources, check which ones have already been transmitted and replace them by a small index. But practically, it’s more complicated. Each firms has its own algorithm. There is no standardization, so de-dupe is perfect for backup but risky for archiving.

In the list below, we cannot guarantee that all companies are using their own algorithms and some have only patents and no products.

Today the question is more “Which storage companies do not have de-dupe?” rather than “Which companies are involved?“. All these later sign OEM contracts with other ones to implemented de-dupe, a technology absolutely necessary today to sell backup or VTL and even WAN solutions, and probably in the future on primary storage systems, for the users to reduce its number of HDDs and more costly SSDs.

Note: after the name of a firm, a “/” precedes the company (ies) acquired for de-dupe.

(ABOUT) ALL COMPANIES IN DE-DUPE

3Qube Technologies
3X Systems
AC&NC
Ace Data
Acronis
Acrosync
Actifio
AetherStore
ArcMail Technology
Atlantis Computing
Altaro
AltDrive
American Megatrends India
Artisan Infrastructure
Ascava
Astute Networks
Atempo
Atlantis Computing
Attix5
Bacula
balesio
Barracuda Networks
Bitcasa
BitSpeed
Bluelock
BluPointe
BridgeSTOR
Brocade (patent)
CA/XOsoft
Caminosoft
Cavium
Clearpace Software
CloudBerry
CloudFounders
Cobalt Iron
Code 24 Software
Cofio
Commvault
Convirture
Copiun
Corus360
Ctera Networks
Data Storage Group
Datacastle
dataStor
Dell/Ocarina/AppAssure
Digitili
Druva Software
Dynamic Solutions International
EMC/Data Domain/Avamar/XtremIO/Dell
Eversync
Exablox
Exagrid
Exar/Hifn
FalconStor
Fujitsu
Genie9
GFI Software
GreenBytes
Hitachi (patent)
HGST (sTec, WD)
HPE
IASO
iB3
IceWEB
IBM/Tivoli/Storwize/Diligent
id7
Idealstor
Imation/Nine Technology
Infineta Systems
Infortrend
InQuinox
InterCloud Systems/VautLogix
Iron Mountain
Ixilix
IzumoBASE
Lortu Software
Kaminario
KeepItSafe
KineticD
KVMBackup
Luminex
Maginatics
Maxta
Maxtronix
Metalogix Software
Microsoft
Morro Systems
Nakivo
Navisite
NEC
NetApp
Netgear
NetJapan
NetLogic
Nexenta
Nexsan
Nimble Storage
Nimbus Data
Nine Technology
NovaStor
Nubisio
OnApp
Open-E
Opendedup (open source)
Oracle/Sun ZFS
Overland/Tavata Software
Panzura
Parsec Labs
Permabit
PHD Virtual
Pixel8 Networks (patent)
Pure Storage
Qsan Technology
Quadric Software
QUADStor Systems
Quantum/ADIC/Rocksoft
Quest Software/BakBone
QuorumSoft
RainStor
Rebit
Revinetix
Riverbed
ROBObak
SEP
Sepaton
SGI/Copan/Dell
Silver Peak
SimpliVity
Skyera
SolidFire/NetApp
SoftNAS
Spectra Logic
Sterling Data Storage
StoneFly
Storagedata
Storageflex
StoredIQ
StorSimple
Tandberg Data (dataStor)/Sphere 3D
Tegile
Teradata
Tilana
TwinStrata
UltraBac Software
Unitrends
Vaulten
Vaultize
Veeam Software
VeloBit
Venyu
Veritas
WhipTail Tech
Yottabyte
Zetta