Last year in June, we sunset Artdoxa, an online arts platform I more or less worked on from its humble beginnings and have been the sole developer of for the last 11 years or so. The reason we took it offline is that while there were a couple of active users still using it, we never managed to get it to pay for itself and its owner (and quite frankly me as well) didn’t want to maintain and operate it at a loss.

When we finally took it offline, we had agreed on me spending a bit of time to try to create an archived version, fully functional in read only mode. This is what I am currently working on.

Your first instinct might be to just create a static version and still have a bit of time reserved this week for trying that, but it is “Web 2.0”-ish enough to get complicated with a ton of views that load subviews via jQuery-backed Ajax-requests.

So, the question is, how do you archive a Rails application in a way that you can hopefully start it up in 20, 50, maybe 100 years from now? (Not that I personally believe that computers in their current form make it into the 2100s)

While something like Docker would be an obvious choice, it is kind of problematic - It is still a bit of a moving target so regardless of if I use Docker under the hood to make the app run, I still probably need to seal Docker into a VM that pins the version of Docker.

Virtal Machines, but how?

My second idea would be something like VirtualBox, which would be good, because there are virtual machine image formats that can be exchanged between different VM hosts. But VirtualBox on its own has a very specific isssue: It is CPU dependent. Does that mean that I simply create two images, one for Intel, one for Arm? What happens if both of these CPUs are obsolete in 20 years from now? Given how bullish some people are on Open Source CPUs like RiscV, I don’t think it’s completely unwarranted to worry about that.

It seems to me that right now, targeting qemu is the best bet. We can use qemu-kvm on intel machines that quite frankly will probably be around for a long time and actual emulation on other CPU architectures where the emulation tax will probably be less and less of an issue.

Today, I, because I am more familiar with it, started to work on a VirtualBox VM and made the web application run on that, but I did choose vmdk as the virtual hard drive file format as it is the most compatible and should be usable in qemu.


Now, the other thing that’s kind of an issue is the actual web browsers, because while most browsers are reasonably backwards compatible, given the current state of the web browser landscape, I don’t think it’s reasonable to bet on that for the next 100 years. So the idea is to make the image actually boot a graphical user interface and start up Firefox. There is the slight issue of expiring and self signed certificates but I guess that can be part of the documentation.

The challenges ahead

An interesting problem is that we’ve hosted the images of the artworks shared on Artdoxa on Amazon S3, as one does, but since we did try to cut down costs, we have migrated the bucket to Glacier deep archival - Instead I am working with a local copy of the files. A copy that is roughly worth half a terabyte. Which makes for a somewhat unwieldy VM disk image. Especially if I want to store it on a disk that is supposed to be read on as many future and past operating systems as possible. If you’re currently thinking “oh no, he isn’t…” - Yeah, I am running this off of a FAT disk. Luckily, vmdk has a dedicated FAT mode where its splitting up the disk file into chunks that are 2GB in size at most. I am currently copying the files into the VM using SCP and it is fun to see the image files growing and growing, but let me tell you one thing: A FAT backed vmdk is not exactly the fastest hard drive emulation you can imagine. So I’ll let this run over night I guess.

The Artdoxa homepage had a fun little gizmo: The artwork of the day, calculated on a simple algorhythm that would include views and likes, I think. As a hommage to our sunset, I decided it would display the artwork of the day of the 21st of June for ever but I need to clean up the database a bit to remove the few days the background job ran on after I switched off the website. I don’t want to know how many forgotten servers run cron jobs for applications that are long gone and happily send out emails to irritated users :)

I am still debating if I should try to make this archive available publicly. This would mean that I would need to scrub personal details out of the database, though, so I will most definitely only do this if I can find some extra time. I guess a static version would lend itself much more easily to that.

Tomorrow I will try to turn this VirtualBox image into something I can run in qemu or, failing that, take the lessons I learned today and do everything again.

It is a fun challenge and I am quite happy that am able to spend some time on this to do it right.