Using S3 without an SDK
Introduction
S3 is the de-facto standard for cloud-based object storage, even if you don't use AWS for your object storage. AWS provides SDKs for JavaScript, Python, PHP, .NET, Ruby, Java, Go, Node.js, and C++. What do you do if you're using another language without an existing SDK for S3? I'll demonstrate how I got my Pascal based app to interface with S3.
AWS CLI
An SDK is one way to interface with S3, but certainly not the only way. AWS also provides their CLI (command-line interface) tools for Windows, MacOS, and Linux. Common operations are easily handled with the CLI. Here are some examples:
- list buckets:
aws s3 ls
- list objects within a bucket:
aws s3 ls s3://my-bucket
- create bucket:
aws s3 mb s3://my-new-bucket
- delete bucket:
aws s3 rb s3://my-old-bucket
- retrieve object:
aws s3 cp s3://my-bucket/my-file.zip /home/sue/some-file.zip
- delete object:
aws s3 rm s3://my-bucket/old-file.zip
- upload object:
aws s3 cp my-new-file.zip s3://my-bucket/my-new-file.zip
Shell script templates
Looking at the CLI examples, one can quickly see that bucket names and file names will change, but the rest of the command stays the same. I decided that I would create generic shell scripts, one for each type of S3 operation. I then parameterize these shell scripts by treating them as a template. You could also just pass command-line arguments to the shell scripts. I'll show later why I prefer to use the template approach with placeholders.
Here is my templated script file (s3-get-object.sh)
to retrieve an object:
#!/bin/sh
export BUCKET_NAME="%%BUCKET_NAME%%"
export OBJECT_NAME="%%OBJECT_NAME%%"
export S3_ENDPOINT_URL="%%S3_ENDPOINT_URL%%"
aws s3 cp --endpoint-url=$S3_ENDPOINT_URL s3://$BUCKET_NAME/$OBJECT_NAME %%OUTPUT_FILE%%
I have 3 variables to populate for this script: bucket name, object name, and endpoint URL. Since I treat this script file as a file template, my application first makes a copy of it by copying it to exec-{template-file-name}.sh
. In the example of retrieving an object, my app copies s3-get-object.sh
to exec-s3-get-object.sh
. Next, my app does a generic search and replace of all variables. For example, after running my search and replace the contents of exec-s3-get-object.sh
will be:
#!/bin/sh
export BUCKET_NAME="pjd-b-artist-songs"
export OBJECT_NAME="Bad-Company--10-From-6--Rock-N-Roll-Fantasy.flac"
export S3_ENDPOINT_URL="https://s3.us-central-1.wasabisys.com"
aws s3 cp --endpoint-url=$S3_ENDPOINT_URL s3://$BUCKET_NAME/$OBJECT_NAME %%OUTPUT_FILE%%
Finally, I just execute this shell script.
Why not just pass arguments to script?
I could also just have accepted those same 3 arguments as parameters to the script so that I could invoke it with: s3-get-object.sh pjd-b-artist-songs Bad-Company--10-From-6--Rock-N-Roll-Fantasy.flac https://s3.us-central-1.wasabisys.com
.
I prefer having a fully populated template file because it's easy to test and debug. If something is not working correctly you have the exact set of data used for your script invocation to examine and test outside of your application.
App integration
In this approach, the app is simply automating the execution of some generic shell scripts. Your app needs to have some utility routines to:
- read a file (text and binary)
- write a file (text and binary)
- perform search and replace of key-value pairs
- execute an external program (shell script)
- capture standard output and standard error
- copy a file
- delete a file
My app code
I used this approach in the Free Pascal version of my cloud music player.
Repo: https://github.com/pauldardeau/pascal-cloud-jukebox
- S3 integration code:
S3ExtStorageSystem.pas
- Utilities code:
JBSysUtils.pas
Although I focus on shell scripts in this post, my app also works on Windows using this same approach (just .bat files instead of .sh files).