This article also appears at https://www.linkedin.com/pulse/problem-solid-state-storage-suggested-solution-martin-houston so if you are on linked-in you can choose to comment there instead..
There are several computers inside an Ocado Warehouse robot, each controlling some aspect of the operation. The single ARM CPU Linux based computer within each robot I was working on had two jobs, one to manage the in situ upgrading and if needed downgrading (change back-out) for the firmware on the other control computers. This was only occasionally used so the relatively meagre performance of the hardware was not an issue. The other role was considerably more demanding. That of collecting all the low level debug information logs from the rest of the robot and delivering them for analysis when uploaded every time the robot docked for battery charging. While working out on the warehouse grid the robots were controlled by WiFi signals but there was not enough bandwidth available to send back a fire-hose of detailed status & debug information in real time. While the robots were under active development the volume of debug data grew and grew and the only place to put it until it could be uploaded was the micro SD card inside the control computer board. This micro SD card had a maximum write speed of about 10MB/s. Having spent several months looking after hundreds of computers behaving in this way I have a new respect for the reliability and longevity of standard consumer grade microSD cards but nothing was going to change the fact that they were too slow for the job of handling a potentially limitless appetite for debug information from developers trying to nail illusive problems.
Writing the logs to memory was much, much faster than to SD card but the control computer had only been specified with 512MB of RAM, as it was never envisaged that such a large volume of data would need to be collected during robot development. I did some research and found that with the fallocate system call it is also possible to punch holes in a file at the beginning as well as the usual file truncation action of freeing blocks at the end. What you are left with is a ‘sparse’ file where sections that have nothing stored on disk read as full of nulls. I found that if you truncate the beginning of a log file the processes writing that file simply do not care. It is still possible to append and also to ‘tail’ the file to read back the most recent contents. The file can grow and grow to whatever the size limit is of the underlying file-system but only occupy a small amount of actual space. This discovery allowed me to collect logs into ram-disk instead of direct to the slow SD card. I used inotify system calls to watch a whole tree of log files being written alerting a process which collected all but the last few KB of each file and produced a compressed multiplexed version of all of them. The actions of compressing and multiplexing increased the effective write rate of the SD card enough to cope with much higher rates of logging activity, in effect kicking the can far enough down the road that the developers could have what logging they liked. The way SD cards work is that if 4MB of data is all written at once in a single file stream the write is much more efficient as that is the size of the erase block in the SD technology. I thought I had a prefect solution! However when fully stress tested I found it was missing one component, and one that would be non trivial to write.
I was able to run tests with simulated log data writing into the ram-disk that eventually overran the ability of the inotify driven background process to catch up. It took many minutes but slowly the ram-disk would fill up completely forcing the writing of log-file data to fail – implying missing, possibly vital, log information. What would be nice, I thought, was if the system calls writing data to a file-system that was in danger of getting full could be slowed in some way, just by a few microseconds, that would give the background process time to catch up.
An old mechanical hard disk would in a way do this. The furious search for free blocks would increase IO wait time so that writing blocks to a nearly full file-system would indeed take longer. However regressing back to mechanical disks is no solution as the forced head movements would also hamper processes that would be reading and consuming the data too!
What the Linux kernel needs is some way to simulate the slowing effect of writes that a nearly full file-system has on processes wanting to make the situation worse, and in a gradually increasing severity, but no corresponding penalty to readers (and removers) of data. I knew this would solve my immediate problem and then realised that it would have highly beneficial effects for data storage in the Enterprise too. Having file-systems and filers which exhibited this behaviour would get an early warning on file systems filling up, but most importantly a way to delay the inevitable full file-system crisis. With fancy enough monitoring it would be possible to isolate the issue to a single “way too chatty” application. The rate of log writing just for that process could be slowed so that the team responsible would have time to sort out what is going wrong. The fallocate trick I had found for dealing with the robot logs would also come in handy here – if a log has been discovered that has been growing for months or even years (a failure to implement log rotation), then a fallocate punched hole could be used to archive or just dispose off all data too old to be interesting without having to disrupt the running process at all.
Even if the rate for a single process had to be slowed to effectively a full stop it is definitely “the lesser of two evils” than allowing the file-system it is writing to to fill up. That would more likely have collateral damage to other well behaved parts of the infrastructure that were using that portion of storage space responsibly. The normal panic mode thing that system admins have to do in such a situation, on filers which have that luxury, is to give that file-system more (expensive) storage. This is a costly way to “fix” the problem and it does nothing to address the reasons why that file-system got full in the first place.
This was several months ago now and at the time I did a search so see if any such feature already exists in the kernel but drew a blank. As I had seen enough email circulars of how Ocado was keen on maximising their IP I put forward a proposal to do this seemingly missing part of Linux kernel capability as a project. My request was turned down (even though it was the missing piece needed to solve the log collecting issue with the robots). I was told I was welcome to do the work “in my own time”. Now I no longer work there here is my chance to ask people 1. does this technology already exist or 2. anyone fancy giving me a hand in writing it?