What are you looking for ?
Advertise with us
RAIDON

Google: How to Migrate Cloud Storage Data from Multi-Region to Regional?

Tool for job is Storage Transfer Service which uses parameterization to bulk migrate files.

By Remy Welch, data engineer, and Harrison Sweeney, data engineer, Google Cloud

There are many considerations to take into account when choosing the location type of your Cloud Storage bucket.

Google Cloud Storage 2302

However, as business needs change, you may find that regional storage offers lower cost and/or better performance than multi-region or dual-region storage. By design, once your data is already stored in a bucket, the location type of that bucket cannot be changed. The path forward is clear: you must create new, regional buckets and move your existing data into them.

Migrating from multi-region to regional storage
The tool for this job is the Storage Transfer Service (STS), which uses parameterization to bulk migrate files. The basic steps are as follows:

  1. Create new buckets in the region you desire.
  2. Use STS to transfer the objects from the original multi-region buckets to the new regional ones.
  3. Test the objects (e.g., using Cloud Storage Insights) in the new buckets and if the test is passed, delete the old buckets.

While there is no charge for use of the STS itself, performing a migration will incur Cloud Storage charges associated with the move – including storage charges for the data in the source and destination until you delete the source bucket; for the Class A and B operations involved in listing, reading, and writing the objects; for egress charges for moving the data across the network; and retrieval and/or early delete fees associated with migrating Nearline, Coldline and Archive objects. See the STS pricing documentation for more information.

Though we have focused on a multi-region to regional Cloud Storage migration, in the steps that follow, the considerations and process for any other type of location change will be much the same – for example, you might want to migrate from multi-region to dual-region, which could be a good middle ground between the options, or even migrate a regional bucket from one location to a different regional location.

Planning migration
The first determination will be which buckets to migrate. There could be a number of reasons why you would choose not to migrate certain buckets, for example, the data inside might be stale and/or not needed anymore, or it might serve a workload that is a better fit for multi-region, for example, an image hosting service for an international user base.

If you’re transferring massive amounts of data, it is also important to consider the time it will take to complete the transfer. To prevent any one customer from overloading the service, the STS has queries per second and bandwidth limitations at a project level. If you’re planning a massive migration (say over 100PB or 1 billion objects) you should notify your Google Cloud sales team or create a support ticket to ensure that the required capacity is available in the region where you’re doing the transfer. Your sales team can also help you calculate the time the transfer will take, which is a complex process that involves many factors.

To determine if you need to be worried about how long the transfer could take, consider the following data points: A bucket with 11PB of data and 70 million objects should take around 24h to transfer. A bucket with 11PB of data and 84 billion objects could take 3 years to transfer if jobs are not executed concurrently. In general, if the number of objects you need to transfer is over a billion, the transfer could take prohibitively long, so you will need to work with Google Cloud technicians to reduce the transfer time by parallelizing the transfer. Note that these metrics are for cloud to cloud transfers, not HTTP transfers.

There may also be metadata that you want to transfer from your old buckets to your new buckets. Some metadata, like user-created custom fields, are automatically transferred by the STS, whereas other fields, like storage classes or CMEK, must be manually enabled via the STS API. The API or gcloud CLI must also be used if you wish to transfer all versions of your objects, as opposed to just the latest one. If you are using Cloud Storage Autoclass in the destination bucket (it must be enabled at bucket creation time), all of your objects will start out as a Standard storage class after the transfer. Refer to the Transfer between Cloud Storage buckets documentation for guidance on handling all complexities you may have to account for.

Your final decision point will be whether you want to keep the exact same names for your buckets, or whether you can work with new bucket names (e.g., no application changes with the same bucket name). As you will see in the next section, the migration plan will require an additional step if you need to keep the original names.

Steps for migration
The diagram below shows how the migration process will unfold for a single bucket.

Google Migrate Cloud Storage Scheme 2302

You may decide that in order to avoid having to recode the names of the buckets in every downstream application, you want your regional buckets to have the exact same names as your original multi-region buckets did. Since bucket names are as immutable as their location types, and the names need to be globally unique, this requires transferring your data twice: once to temporary intermediate buckets, then to the new target buckets that were created after the source buckets had been deleted. While this will obviously take additional time, it should be noted that the second transfer to the new target buckets will take approximately a tenth of the time of the first transfer because you are doing a simple copy within a region.

Be sure to account for the fact that there will be downtime for your services while you are switching them to the new buckets. Also keep in mind that when you delete the original multi-region buckets, you should create the regional buckets with the same name immediately afterwards. Once you’ve deleted them, theoretically anyone can claim their names. 

If you are aiming to transfer multiple buckets, you can run multiple jobs simultaneously to decrease the overall migration time. STS supports around 200 concurrent jobs per project. Additionally, if you have very large buckets, either by size or number of objects, it is possible that the job may take several days to fully transfer the data in the bucket, as each job will copy one object at a time. In these cases, you can run multiple jobs per bucket and configure each job to filter objects by prefix. If configured correctly, this can significantly reduce the overall migration time for very large buckets. This library can help with managing your STS jobs, and testing the objects that have been transferred.

What’s next?
With great flexibility of storage options comes great responsibility. To determine whether a migration is necessary, you will need to do a careful examination of your data, and the workloads that use it. You will also need to consider what data and metadata should be transferred to the buckets of the new location type. Luckily, once you’ve made the decisions, Cloud Storage and the STS make it easy to migrate your data. Once your data is transferred, there are other ways to optimize your usage of Cloud Storage, such as leveraging customizable monitoring dashboards. If you’re not using the STS, perhaps for smaller transfers or analytical workloads where you’re downloading and uploading data to a VM, consider using the gcloud storage CLI.

Articles_bottom
ExaGrid
AIC
ATTOtarget="_blank"
OPEN-E