Home     Xml Feed    Submit Articles     Editor Login Contact us
SubmitEdge Articles Directory
  RSS Feeds   Add us to favorites
  Make us your home page
Free Newsletter 
Subscribe to newsletter
Sponsors
  • SubmitEdge
  • sponsor submitedge.com
    Categories
    Advertising
    Art & Entertainment
    Arts
    Automotive
    Blog
    Business
    Business > Telecommunication
    Career & Employment
    Computer & Internet
    Customer Service
    Dating
    Debt Settlement
    Education
    Education
    Education
    Educationss
    Entertainment > Party
    Finance
    Finance > Credit Card Processing
    Finance > Offshare Banking
    Finance > Payday Loan
    Finance Insurance
    Food & Beverages
    Franchise
    Games > Online Games
    Health & Fitness
    Health > Beauty & Hair
    Health > Fitness Equipment
    Health > Hypnosis
    Health > Plastic Surgery
    Health > Skin Care
    Health Dental
    Home > kitchen
    Home > Doors
    Home > Furniture
    Home > Gardening
    Home > Home Decorating
    Home And Family
    Insurance
    Internet business > Affiliate program
    Internet Business - SEO
    Legal
    Music
    Online Business
    Online Dating
    Quit smoking
    Real Estate
    Recreation & Gambling
    SEO
    Shopping & Product reviews
    Shopping > Auctions
    Society & Dating
    Society > People
    Software
    Spiritiuality
    Sports
    Sports
    Technology
    Travel
    Travel > Hotels
    Vehicles
    Web Design
    Wedding


    How to Evaluate Your Business Need for Data Deduplication
    Author: Jay Livens
    Website: http://www.sepaton.com/
    Added: Sat, 28 Feb 2009 09:03:18 +0000
    Category: Computer & Internet
    Printable version | Email | Bookmark

    The volume of data generated by companies today is growing explosively. More powerful computing technology and the evolution to an information-based economy are causing companies to generate more data than ever before. To deal with this overwhelming data growth and related storage requirements, many companies are evaluating the use of data deduplication technology.

    By simple definition, data deduplication technology is software that compares data in new backup streams to data that has already been stored to identify and remove duplicates.   Today, deduplication has become an essential tool in helping data managers to control exponential data growth in the backup environment. However, the methods used to accomplish data deduplication vary widely as do the levels of capacity optimization they can provide.   For example, virtual tape libraries provide a level of performance and reliability that traditional physical tape systems cannot approximate. VTLs enable companies to back up data many times faster than tape, restore data quickly and eliminate a variety of time-consuming manual tasks. However, without data deduplication, the cost of disk is higher than that of tape, forcing companies to use disk space carefully by keeping online retention times short and moving data to tape archive as quickly as possible.

    To truly understand data deduplication, it’s vital to understand the differing approaches to data deduplication.  There are two general ways that deduplication technologies operate— hash-based comparison and the ContentAware™ comparison method used in the SEPATON DeltaStor® deduplication software on an S2100®-ES2 virtual tape library (VTL). 

    The hash-based approach runs incoming data through an algorithm that assigns a unique number (called a hash) to every chunk of data. It then compares the new hashes to those that have already been stored in a lookup table. If the new hash does not match, then it stores the corresponding chunk of data and adds the new hash to the lookup table. If the new hash does match one in the lookup table, it does not write the corresponding data to disk and records the duplicate in the hash table so that the data can be reconstituted for restores. 

    Meanwhile, the ContentAware approach reads the data that is in the backup and identifies commonalities and relationships between the objects/documents (e.g., Microsoft® Word document to Word document or Oracle® database to Oracle database) to narrow the search for duplicates.  It then compares data in these objects at the byte level for maximum capacity reduction.

    As described above, hash-based technologies start by breaking data into chunks and assigning a number to each chunk called a hash. New data is stored, and duplicate data is simply recorded in a “use count” tally. Each new backup gets broken up into more pieces that have to be identified, compiled and reassembled to restore. As a result, the more data stored on the system, the more pieces you generate. In contrast, the ContentAware approach uses the most recent (newest) backup as the reference data set. It compares data stored previously to this reference data set to identify duplicates.

    Another distinction between deduplication technologies is whether they deduplicate a given backup set inline as part of the backup process or concurrently with the backup process. Inline deduplication aligns well with hash-based comparison technologies and provides a cost-effective way for small to medium-sized organizations to reduce their data center capacity needs.  The concurrent method begins the deduplication process as the first backup job completes. It has several distinct advantages for larger backup volumes. With it, the VTL can load balance the backup and deduplication processes across multiple nodes, enabling it to complete both processes faster than an inline system. It also stores the most recent backup in its intact form, enabling it to perform a data integrity check before any duplicate data is replaced with pointers.

    Many deduplication technologies cannot scale backup performance or deduplication processing across multiple processing nodes. As a result, you have to add multiple individually-managed boxes (see capacity scalability above), or tolerate significantly slower backup times. With the Scale-Out Deduplication™ capability of a SEPATON VTL with DeltaStor software, you can add capacity or performance to backup and deduplicate petabytes of data in a single system.

    It’s important to note that most deduplication technologies are “all or nothing,” requiring you to deduplicate all of your backup data and to do so with the same algorithm. This method is adequate for small backup environments. However, in an enterprise, being able to fine-tune deduplication to your needs, data types and business objectives is essential. The efficiencies to be gained through deduplication depend on a number of factors, including (but not limited to):

    • The amount of duplicate data in the backup stream
    • The data application type (Exchange, Oracle, Word, etc.)
    • The required online data retention period—longer retention times result in greater deduplication efficiency
    • The number of times per week that full backups are performed

    When considering a data deduplication solution for your business, be sure to evaluate how each potential solution meets your needs, which may include:

    •    Backup Performance and Time to Protection - Be sure to understand how a data deduplication technology will affect your backup and how quickly your data will be moved to the protection of a VTL. If you have full backups of more than 10 TB, you should consider an enterprise-optimized deduplication technology like DeltaStor® software.
    •    Restore Performance - Choose a technology based on three key characteristics of your file restore needs: how often you need to restore files; the age of the files you typically restore (e.g., how often are files more than 30 days old) and how quickly you need to complete file restores. If restore time is a priority for you, choose a system that uses forward differencing to ensure that restores can be performed instantly, without “reconstitution”.
    •    Deduplication Efficiency - It makes sense that the more duplicate data you have in your backup stream, the more beneficial a deduplication technology will be in your environment. Understand what level of deduplication efficiency is realistic in your environment and whether that is sufficient to offset your data growth.
    •    Risk to Data Integrity -Consider a solution that keeps an intact copy of your most recent backup and performs a second-level data integrity check .
    •    Capacity and Performance Scalability - Before choosing a technology, understand the implications of outgrowing your capacity and performance. Will adding capacity and performance mean maintaining numerous “silos of storage” or require a forklift upgrade to a new system?

    For more information on choosing which technology is right for you, and more specifically, SEPATON’s DeltaStor® deduplication software visit <a href="http://www.sepaton.com/" style="color: #0000ff">www.sepaton.com</a>.

    View all Jay Livens's articles


    About the Author:

    Jay Livens is author of this article on backup solutions. Find more information about virtualization backup here.

    More Computer & Internet articles


    :- Articles Search

      
    Search our article database!

    :- Recent Articles
    Benefits Offered with SEO Submission Services Offered by Submit Edge 2
    Maintain a spotless house with a New York maid service
    Dream careers made easy with sports scholarships
    Shape up smartly with the best diet programs
    A submitedge review is peep hole into world of online marketing
    Benefits of Strategy Games
    Free Online Game
    Purchasing through penny auction
    Cheap flights to Zimbabwe – Fantastic Option for low budget travelers
    Cheap flights to South Africa
    Priligy: Effective Way to Deal with Premature Ejaculation
    Priligy: Effective Way to Deal with Premature Ejaculation
    Priligy: Effective Way to Deal with Premature Ejaculation
    Why Anyoption Fraud is False
    Cracking the iPhone code with jailbreak software.
    The Antic Cake Stands
    Variations of Flash Slot Games Online
    Finding the Best High Roller Online Casino for You
    Factors to Be Considered In Choosing an Online Kasino
    The Epoch of Japanese Garden Bridges

    :- Top Resources


    Copyright 2005 SubmitEdge Articles Directory. All Rights Reserved.


    Powered by: Content Management