First Cloud Music Library

24 Apr, 2023

April 24, 2023

Introduction

The year was 2014 and my entire digital music library was contained in Apple's iTunes. I was all in with the Apple ecosystem - Mac, iPhone, and iPad. Everything seemed great except each new update of iTunes (and OSX) felt like it had gone backwards in usability. It was a trend and I knew that things had reached the stage where my frustration level merited a switch to something that I would control.

An interviewing assignment

At this same time, I was interviewing for a job with (large maker of HDDs and SSDs). As part of my interview, the hiring manager said that he was going to give me an assignment. The assignment was to take a look at OpenStack Swift, Ceph, and HDFS; and then pick one and build something useful with it. He didn't give me a specific timeline, but I figured that 1 week was what I should target.

Starting with OpenStack Swift

I picked OpenStack Swift (cloud object storage) because it was better documented than the other 2, easier to work with, and it might be what I needed in my transition away from iTunes. OpenStack Swift is implemented in Python and has a Python client library, so I started my assignment's implementation in Python. It didn't take much imagination to see that each song file would be an 'object' that would be stored in Swift. The bigger challenge was figuring out how to store the metadata of my music collection. What would capture the details that these specific 8 song files are collectively part of an album with the tracks ordered in a defined sequence? How would I keep a master list of all the songs in my collection without having to enumerate every object?

SQLite for metadata

I knew that I needed a simple database, yet I didn't want to set up a database server. How about using SQLite and storing the SQLite file within OpenStack Swift along with all of the song files? Seems plausible. OpenStack Swift, like AWS S3, does not have any mechanism for storing hierarchies. Instead, objects are stored in a 'container' (S3 calls them 'buckets'). You can create however many containers you want, but containers cannot be nested. I decided that I would store my SQLite file in its own container ('music-metadata').

Container structure

Being brand new to cloud object storage systems, it didn't feel right to store all of my song files in a single container. I thought back to the days when I bought real vinyl albums or physical CDs. The stores grouped the albums and CDs alphabetically and had separators with each letter of the alphabet. It worked well enough for them and would probably suffice for me too. My plan was to create 26 containers, one for each letter of the alphabet and based on the artist's name. I created 26 containers, named 'a-artist-songs', 'b-artist-songs', and so on, to 'z-artist-songs'.

Addressing a few wrinkles

Then I noticed that there were some artists in my collection that didn't fit into this naming scheme -- artists such as 10,000 Maniacs and 38 Special. Okay, it's easy enough to extend my scheme to also have '0-artist-songs' through '9-artist-songs'. Next, I wondered if I had any artists in my collection that wouldn't work. A quick check verified that my container naming scheme would work for my entire collection.

I suppose there's one other wrinkle that I had to address. Which container would hold 'A Flock of Seagulls' songs or those from 'The Doors'? I decided that an artist whose name begins with 'A ' or 'The ' would have the first letter of the next word used to decide where it goes. Hence, songs from 'A Flock of Seagulls' would go in 'f-artist-songs' and songs from 'The Doors' would go in 'd-artist-songs'. What about artist names that are individual's names? Artists like 'Jimmy Buffett' or 'Billy Squier'? Should Jimmy Buffett songs go in 'j-artist-songs' or 'b-artist-songs'? I'm overthinking this! It doesn't matter, just pick a convention and stick with it. Alright, no need to do anything special, just put Jimmy Buffett songs in 'j-artist-songs'.

File naming

I decided early on that I wanted to have my own naming convention for song files. My convention is Artist-Name--Album-Name--Song-Name.encoding-type, where encoding-type is one of '.mp3|.m4a|.flac'. I have been asked several times why I don't just rely on ID3 tags. The answer is simple -- not all MP3 files contain ID3 tags and FLAC files don't contain them (I don't think that M4A does either).

The audio player

The other decision was what to use for playing the songs. Since I don't have any special knowledge of how to write my own audio player from scratch I needed to rely on existing ones. I decided to rely on Apple's afplay that ships with OSX/macOS, use mplayer on Linux, and use MPC-HC on Windows.

Mix of cloud storage services

From the very beginning I knew that I didn't want to limit my cloud storage to OpenStack Swift. I abstracted the common functions that I needed and also put in functionality to make use of AWS S3. Azure blob storage would also be easy enough but I didn't implement it.

Retrospective

Looking back I'm satisfied with the approach that I took. Although I still consider it a work-in-progress (and might be forever), it's met my expectations quite well. The times that I appreciate it most is when I'm traveling. As long as I have a decent internet connection, I have the same music listening experience that I have at home.

Where's the Code?

GitHub repo: https://github.com/pauldardeau/cloud-jukebox

License: BSD

#music #cloud storage #openstack swift #object storage #open source #python