Welcome | Sign In
ECommerceTimes.com
Software

EXPERT ADVICE
Maintaining Integrity and Security in a Data Migration

Print Version
E-Mail Article
Reprints
Maintaining Integrity and Security in a Data Migration

Data migrations are complex projects, but no IT department can escape them. Tools are available to expedite the process, but it's the owner of the data who must take the reins and make sure data integrity and security are preserved. The most important point of a data migration is a good backup -- not only in the event of corruption, but also to allow validation post migration.


Data migrations these days have become a necessary evil in every IT environment. The rapid rate at which hardware and software becomes outdated, coupled with a need to save costs by taking old assets off the books as soon as possible, means that data migrations are something no one can avoid.

Vendors have tried to provide tools to make this task as seamless as possible. However, at the end of day, the data owner is the final authority on the migration and its end result. No matter what kind of tool is used, it behooves the data owner to ensure that data integrity and security are maintained during and after the migration.

First, an understanding of three terms is essential:

Data migration: This is the act of moving data from one location to another. This could be as simple as moving a set of files from one drive to another on the same computer or to move several terabytes of data from one data center to another. The word "data" here does not have any boundaries and the term "migration" literally means a one-time movement of any data to change its permanent resting location.

Data integrity: This is to ensure that the structure of the data is consistent in the manner it needs to be maintained and accessed. Generally speaking, data integrity cannot be considered in a vacuum, as it is intimately tied to the manner it is accessed and the layer at which the intelligence for this access resides. When data integrity is compromised, it is called "data corruption." For example, from a filesystem perspective, data integrity may be intact -- i.e., there are no file access errors -- but the application accessing these files, such as a database, may think these files contain corrupt data.

Data security: Every "chunk" of data has security attributes associated with it. The layer at which this data is accessed and the manner in which it is accessed determines the type of attributes that are applied to that layer. Moreover data security itself is like a layered cake -- there is security at every access tier, and each layer is important. For example, when data is accessed via a shared SAN, it is important to ensure that host access security (via zoning and LUN masking) is maintained. However, that does not mean that data compromises will not occur at the higher levels -- such as filesystems, file attributes, etc. Then there is user-level security, and there's network-level security -- so on and so forth.

How They Relate

How do data migration, data integrity and data security impact each other? In a generic sense, a data migration involves a process of moving data from one location to another. This generally happens via some kind of an "engine" that reads data from the source, performs an internal mapping of this data and then writes it to the target. This engine can be of any form and can reside anywhere in the access stack.

For example, it could be software running on a host, in a dedicated appliance, embedded in the network or in the storage array, or simply a copy/paste tool that a user controls with a mouse. Similarly, the source and target locations for this data can be local -- i.e., the same server or array -- or geographically distant. The internal mapping in the engine has the intelligence to ensure that the translation or copy of this data (blocks, files, etc.) maintains the attributes of the data at the level where it's read. As mentioned above, these attributes -- when considered holistically -- ensure the security and integrity of this data.

However, the big "if" in this situation surrounds the intelligence in the engine that is so critical to the migration. In most cases, a tried-and-tested engine will function as promised. However, that does not mean it should be used in a data migration without proper testing. Therefore, practically all data migration projects need to include a data validation phase, when various teams check to see if these attributes are the same between the source and target.

Tools are available that can probe data at different levels and provide a report on any missing or corrupted files. Similarly, there are tools that can verify whether all the security attributes are intact on the target location when the data copy is complete.

Application-Based or Agnostic

In most modern day data migrations, data is either migrated from within the application itself or in a manner that is agnostic (and transparent) to the migration. The benefit of the former -- i.e., application based -- is that the application itself ensures that data security and integrity attributes within the application itself are maintained during the migration. For example, Oracle (Nasdaq: ORCL) DataGuard is an application level utility that can be used for database migrations.

The benefit of the latter is similar. Since the migration occurs at a lower level in the IO stack, all data is treated the same way, regardless of its type, thereby ensuring that all attributes are carried over as is. These types of migrations -- in which the migration method is agnostic to the type of data and the manner in which it is accessed, and the migration occurs at a lower layer in the IO stack -- are known as "block level migrations."

Migrating data at the file level or changing the manner in which this data is accessed can present its own sets of challenges. For example, when data is copied between different vendors' network attached storage arrays, preserving these attributes during and after the migration can be a nightmare. This is mostly because of interoperability issues.

The same can hold true if data is copied from a Unix server to a Windows server, or the access mechanism is changed from an NFS (Network File System) to a CIFS (Common Internet File System). Of course, there are tools available that can make the migration easier or minimize issues with integrity and security, but they are not perfect. These migrations, therefore, tend to take a long time.

Good Backup Is Critical

In other types of migrations, the data is actually moved instead of copied. In other words, there is a point after which the migration cannot be cancelled or reverted. These migrations often require an intermediate go-no-go checkpoint, during which a preliminary data validation is performed. If everything checks out okay -- only then does the process of moving the data begin. If something goes wrong after the checkpoint, the only recourse is to restore from backups.

The most important point of a data migration is a good backup. A good backup is critical in a migration not only in the event of data corruption, but also to allow data validation to occur post migration. For example, if after migrating data a problem with user-level permissions is discovered, it can be compared with the attributes of backed-up data and fixed on a case-by-case basis. Most backup software does a good job of backing up all standard data attributes for security and data integrity.

Data migrations are complex projects. Maintaining the validity of data is one of the most important but unwritten assumptions of a migration. No one really talks about it, but everyone always assumes that the data will maintain all of its properties post migration. No one likes to be told that something happened during the migration and has resulted in this rule being violated. That's called a failed data migration.


Ashish Nadkarni is a principal consultant at GlassHouse Technologies.


Print Version E-Mail Article Reprints More by Ashish Nadkarni


Related News Alerts

Oracle Activate Alert | Search Archives

More by Ashish Nadkarni

Cloud Computing, or Everything as a Service
September 03, 2009
"Cloud computing" is a new name for a concept that has been around for a while, but what makes the current trend more powerful is the main factor driving it: cost-cutting. Once a uniform set of standards is in place, there won't be any stopping the "as a Service" model.
Keys to an Even Greener Data Center
August 05, 2009
Achieving greater power consumption efficiency in the data center is not an overnight change, but rather a multi-year journey that provides benefits in an incremental manner as initiatives are accomplished. However, every journey begins with a first step, and a data center energy audit is where your firm can begin to make sure you're getting the most out of every kilowatt hour.
Bite the Bullet and Throw Away Your Data
June 03, 2009
There's only one way to keep data storage costs under control, and that's to get rid of unnecessary data. That may go against the grain of IT managers who rightly consider backup and preservation as critical to an organization's health, but when it comes to data storage, there are ways to separate what's useful from what's disposable.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network