Google Cloud Storage is great because it allows storing large datasets at a minimum cost. Moreover, if you do processing on the cloud, it’s one of the best providers. Datasets are stored in what’s called a bucket. Sometimes, though it becomes necessary to transfer datasets between one bucket to another bucket. The tutorial shows how you can transfer between two buckets between different accounts with a List of object URLs option.
- Login to your bucket. Inside the bucket, you can see different tabs. Click on the Permissions tab. Click on the Add members button.
- Under New members, add the email address of the user who will access the bucket. Then under the role, scroll to Storage and assign the role of Storage Admin role.
- For a List of object URLs option, you need what is called a Tab-Separated value (TSV) file. The file looks like below:
TsvHttpData-1.0 https://example.com/buckets/obj1 1357 wHENa08V36iPYAsOa2JAdw== https://example.com/buckets/obj2 2468 R9acAaveoPd2y8nniLUYbw==
where the first line TsvHttpData-1.0 is required. The other lines contain the HTTP or HTTPS link to the file, the size of the file in bytes and the Base64-encoded MD5 checksum of the file. More information on the TSV is here.
- To get the size of the file in bytes and the Base64-encoded MD5 checksum, you can use a useful tool gsutil. It allows accessing the Google Cloud Storage files through the command line without actually having to download the file. Install the gsutil in your machine using instructions provided here.
- Once you have installed and set up the profile using
gcloud init
, you can get the file size in a byte usinggsutil du gs://{path-to-the-file-at-bucket}
andgsutil hash -m gs://{path-to-the-file-at-bucket}
which gives the hash md5.
- The file looks like this.
- Put that tsv file in some publicly available domain. Make note of the URL.
- Next login with the user who wants to access the bucket. Next, on the navigation menu, click on Transfer and then click on Create transfer.
- Then, choose a List of object URLs from Select source. Paste in the URL that you acquired from above. Then click Continue.
- Step 2, Browse to select the bucket where you want to transfer the files to. Make sure to read the Transfer options, and select best what suits you. Click Continue.
- You can schedule the transfer as a cron job by selecting Run daily at . . . , or you can just transfer it one time by selecting Run now. Give the unique Description to help identify the task when it is running. Click on Create to create the transfer process.
- Once completed, the same status is reflected in the transfer page.
Header Photo Credit: Google Developers
Leave a Reply