3D VR for a pound!

I just popped into my local Poundland store today for some crisps and I saw these for sale

3d Viewer for £1!

I have seen relatively cheap(ish) “VR headsets” of the sort that take your mobile phone on sale for a while now but like a lot of people did not want to spend out on something I might not have any long term use for. The packaging says “Smartphone not included” – come on Poundland you disappoint me ūüėČ

The Viewer is mostly cardboard, plus some foam, Velcro and a couple of plastic magnifying lenses, held in place by the 3 layers of cardboard sandwich. There were clear instructions.

What you get in the bag, considerable assembly required!

And about 10 minutes later I had the finished article.

The completed 3D glasses. The phone fits in the box.

I have a Samsung Galaxy S4 phone that fitted snugly in the box. If you have a phone considerably larger than this then you may need to look for something more upmarket to house it.

The viewer claimed to be compatible with Google Cardboard this being a standard for base level VR hardware and software that works with it. However the one aspect of the official Google design missing is any way to press a button, thus unless you want to spend a little more money on a fancier handheld controller like this one, you are limited to experiences that involve just changing your view (i.e turning your head) rather than interacting (like playing a game). There is I am sure you are pleased to hear a very simple and cheap solution to this. If you have a USB mouse and one of those OTG usb adapters handy – the sort that lets you plug full size USB keys, keyboards etc. etc into a mobile phone you can use this with Cardboard. The support for mice in Android is such that a button click is the same as a touch event on the screen. Cardboard does not care where the touch event is on the screen, just that there is one, so a mouse connected to the phone and held in the hand does the job! If you have a Bluetooth connected mouse, or keyboard with track-pad ability, then you do not even have to have the trailing wire coming out of your phone – a slight damage risk to your expensive phone if you are blundering about with a VR headset on! However if you have to buy one you may as well get one of the purpose made Bluetooth VR controllers, designed to be used by touch alone and only a few pounds.

Apart from the lack of a button to press the ultra cheap headset works well. I am short sighted but without my glasses the screen image through the magnifier is clear, although with the magnifying lens unavoidably slightly pixelated. Looking at the specs of more expensive headsets what you get is the ability to adjust focus, but for me at least the default seemed fine. Getting a picture less pixelated would require a higher pixel density on the screen of the phone itself. In fact this application is the only real practical use for super high resolutions on a phone screen.

I do not know if prolonged use would result in headaches. It is all rather unnatural for our brain so common sense on how long you use this for in one session would be needed!

There is already quite a bit of software available in the App Store for Google Cardboard, as it a simple baseline for VR expecting just a single click control. I was rather limited not having any way to interact but still managed to enjoy a simulated roller-coaster ride and watching the fish swim by in an aquarium. I could tell my now 3 generations old Galaxy S4 was struggling a tiny bit with doing the graphics. VR is an area where an up to date phone would show the extra value.

I did feel rather vulnerable to real world dangers (like tripping over) while using the headset. Attempting to use it standing up I felt my knees buckling slightly as my balance and sense of motion was being messed with. To feel secure while experiencing a VR world the safest arrangement would be a sturdy swivel char positioned well away from desks or any other obstructions. For people with more space (and money) something like this would be great fun!

Just seen this story on BBC news that 3D TV is pretty much a bust now – interest from manufacturers has all but vanished. The public just did not go mad about it. VR shares some characteristics with 3D cinema in that the trick of giving each eye a slightly offset image trick our brain into conveying depth to a scene. However this whole ‘binocular vision’ thing has evolved for predators catching prey – requiring very keen spatial awareness in the space directly in front of them.

The problem with 3D cinema is this binocular vision trick is the sum total of what can be provided. the far stronger effect is that if we turn or nod our head the scene changes in a way we have evolved to expect. This gives our brains information about what objects are nearer than others, without needing binocular vision even. A prepared film cannot do this, the entire frame moves as one. It is like going through life with your head locked into position with a neck brace. The binocular effect on its own is not very satisfying.

The big experience value of VR through a headset like this is that the image occupies almost all the available field of view and responds to head movement. Without the latter is is nothing more than a very big 3D cinema screen you can carry with you, so in the end just as unsatisfying. The fact that you have to loose visual contact with the ‘real’ world limit the places where it can be used. Maybe what is needed is a harness so that the person with headset on can either stand up, but not able to wander into danger, or float freely away from any obstructions?

 

Docker thoughts

What is Docker?

A feature that has been part of Unix systems pretty much forever is the concept of the ‘chroot’. This is a special system call that makes the process from then on see the new root of the file-system being the directory the chroot is run from. If various files are in expected places, such as libraries, then software can still run within this smaller view of the world. Chroot is a very useful feature. First there is the obvious security applications such as running potentially exploitable network services inside ‘chroot jails’ – these are deliberately limited environments that contain enough files for a service to run without it having any access to the wider machine. Another useful trick with chroot that many Linux system owners may be need of from time to time is that a system can be recovered and run even if the bootstrap or kernel it is configured to use is corrupt in some way. See my article¬†Xtra-PC in depth for a practical example of how this is used.

This system call is (well is supposed to be) irreversible for the life of the process and any of its children. The only way supposed to be out of a ‘chroot jail’ is for all the jailed processes to die.

Cgroups¬†is a feature that has been in Linux kernels for nearly a decade now that extends the old chroot idea in lots of different other directions. As well as only seeing part of the file-systems, Cgroups or Control Groups allow groups of processes to be set up so that they cannot see each other, have a limit placed on allowed memory usage or share of CPU time and can treat things that can be shared such as network cards (physical and ‘virtual’) as if they are the only ones that see them. If this seems like virtualisation to you then yes, it is very close. The big difference however is that only a single Linux kernel is in place underneath it all. This means that this ‘containerised’ approach to partitioning off resources is much lighter weight than full blown virtualisation. With full virtualisation a complete kernel (Operating System) has to be set up onto of virtual hardware. This imposes costs in terms of memory and startup time but does mean you can run e.g. Windows programs under a Linux host (or vica-versa).

Docker is one of a number of technologies that leverage the Cgroups idea and further extends it to provide the tools to build “do one job well” microservices, scaling over multiple physical machines.

The problem that Docker solves is that in today’s Internet enabled software architecture the sensible way to build big things is out of smaller things until the individual components get down to a size where they can be easily understood, tested, debugged and generally inspire confidence. What is not needed when building a large and complex piece of software is side effects between those components. Many of these components in a modern system are built using languages like Java, Python and Ruby and Perl that themselves rely on a myriad of other libraries from various sources.

Although programmers when improving their software try not to break existing programs it often does happen. A bug may need to be fixed for package A to work reliably but forcing that change on package B may cause it to behave in strange ways. The Docker approach is to start from an appropriate base image (in effect a chroot tree of all the libraries and other support files expected in e.g. a base distribution of Debian Jessie) and then using a build script known as a Dockerfile to install either exact or just ‘latest’ versions of just what the specific aim of the container is.

Docker avoids what used to be called on Windows computers “DLL Hell” where it was impossible to have 2 different applications installed on the same machine because their shared library needs were incompatible. Linux machines have traditionally solved this issue by having a custom compiled ¬†version of the Perl or Python or whatever needed by a fussy application installed in a non standard location and overriding the default PATH & LD_LIBRARY_PATH settings for that application. This is a little messy and error prone. Docker improves on it as within each container everything can live in the expected place.

Splitting individual parts of the system up into microservice chunks forces the whole design to only use communication methods that are visible and explicit. Normally this would be via TCP or UDP protocol on either internal or external network interfaces but Docker does also allow areas of the host filesystem to be shared between one or more of the containerised services, either read/write or read only also.

How does this all not result in huge amounts of storage usage? The answer to that is Docker’s use of another Unix/Linux technology that has been around a long time. The Union filesystem. One of the earliest practical uses of the Union filesystem in Linux was the 1995 distribution Linux-FT. This was a ‘Live CD’ that cached any programs actually used to a much smaller file living on the host Windows hard disk. This allowed the system to start fast, get faster as it was used and only use an absolute minimum of the then very expensive hard disk space. At that time a 600MB CD was much bigger than most available hard disks! This trick was all done using a Union filesystem of the read only CD and a writeable filesystem inside a single file on the Windows disk.

Docker takes use of Union filesystems to a whole new level. Each action taken to customise a docker image to make a new image results in a new filesystem layer. These layers are only as large as they need to be to contain the  files that the new operation has changed e.g. installation of a specific version of a package. Even docker images that are downloaded from the Hub may consist of dozens of these layers Рas can be seen from the download process.

This extreme stratification gives opportunities for layer caching and reuse and provides an audit trail that should keep the security team happy.

A Docker handy commands crib

Installing…

Installing docker depends on which flavour of Linux you are using. For example for Centos 7 the following needs to go into /etc/yum.repos.d/docker.repo

[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg

Then install & start it with

sudo yum install docker-engine
sudo systemctl enable docker.service
sudo systemctl start docker

Note only the root user is permitted to start docker containers by default. To enable other users:

sudo groupadd docker
sudo usermod -aG docker yourusername

Basic Container Management

docker ps

Shows any current docker sessions, extra -l option shows just the latest created container (2 lines out output at the most).

docker ps -a

Shows all docker containers both active and stopped.

docker stop

Suspends execution of a current container – remember this is one of the neat tricks that Cgroups gives you.

docker start

Restarts a container that was stopped. Think of a laptop being woken from suspend.

docker rm

Removes a stopped container, forgets about it.

docker rm -f

Removes a container even while it is running! – think of the equally powerful rm -f command.

 

docker inspect image or container

This spits out details about the configuration of the image or the container as a block of JSON code. This has the advantage this it is both easy for humans to read and immediately usable from any programming language that has a JSON parser.

Okay but how do I get containers onto my computer?

Docker images come from the centralised repository called the Docker Hub. If you have registered for a Docker ID you will be able to contribute your own images to this resource but without that you can still choose to consume other peoples containers (and add more layers to them to make them your own).

docker search term

Looks on the docker hub for containers matching your search term. For example Alpine Linux is a very simple and minimal Linux layout ideally suited for use inside containers. Core container images officially supported by Docker have names of just one word. Images contributed by third parties are in the form author/container. Puppet uses a very similar convention for its Forge too.

docker search -f is-official=true alpine
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
alpine A minimal Docker image based on Alpine Lin... 1759 [OK]

The¬†-f is-official=true filter we are using here limits the output to just the base container that Docker officially sanctions. Other matches mentioning ‘alpine’ may be (most probably) based on this but have the hard work of adding other software expressed as those nice union filesystem layers. An image is a collection of such layers – an image that has other images that are derived from it cannot be removed unless those images are also removed.

docker run

docker run -it alpine /bin/sh
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
0a8490d0dfd3: Pull complete 
Digest: sha256:dfbd4a3a8ebca874ebd2474f044a0b33600d4523d03b0df76e5c5986cb02d7e8
Status: Downloaded newer image for alpine:latest
/ #

What this is doing is running a chosen command (/bin/sh) within the image called ‘alpine’ using an interactive terminal. If the alpine image is int already downloaded, or the one on the hub is newer and we state we want ‘latest’ rather than a specific version then it is downloaded. If there are several union layers that make up the image they all get downloaded simultaneously.

We show this by now repeating the run command for one of the other alpine based images on the hub:

docker run -it mritd/alpine   /bin/sh
Unable to find image 'mritd/alpine:latest' locally
latest: Pulling from mritd/alpine
0a8490d0dfd3: Already exists 
f22eacae62c9: Pull complete 
Digest: sha256:98023aa60f8398432f5d62ccd16cea02279bb9efc109244d93d441dd09100e18
Status: Downloaded newer image for mritd/alpine:latest
/ # 

This shows clearly that ‘mritd/alpine’ is based on the alpine files we have already downloaded plus an extra layer of some customisation or other adaptation that contributor felt worth sharing (not important for the purpose of this discussion).

Chunks of storage called volumes ¬†on the host can be added (mounted in) to the container with the -v argument to run. If a path in the container is mentioned on its own then the attached storage will be allocated by docker – but still exist outside of the union filesystem ‘layer cake’ of the image. If a colon separated pair of paths is used the host path is joined into the image as the destination path.

docker run -v `pwd`:/tmp/x -it alpine /bin/sh

Will run that super simple Alpine Linux image but if in the shell if you cd to /tmp/x you will find whatever was in the current directory on the host. If you would like the container to be able to look but not touch the data you are sharing add a :ro to the end.

-v `pwd`:/tmp/x:ro

Adding volume mounts means that the data lives outside of the life-cycle of that particular Docker image, and is not subject to size constraints of the original filesystem container – usually only a few GB. Volumes can also be created using the docker volume create command and can live on shared storage systems making them independent of any one host. See this documentation on flocker opening up to Docker ‘swarms’ operating over large numbers of hosts. This allows a software service to be rolled out in the modern fault tolerant “shared nothing” architecture without needing lots of dedicated physical machines, or even dedicated virtual machines.

Going in the other direction it is possible to narrow sharing down to an individual file level too:

docker run --rm -it -v ~/.bash_history:/root/.bash_history ubuntu /bin/bash

This example allows the shell within a ubuntu container to share your bash history and even add commands to it while running as root within that container. A handy trick while you are still experimenting with what need to go into that Dockerfile….

Lastly if a container has mounted volumes you can create a new container that mounts the same volumes – in effect shares the same data areas by using the¬†–volumes-from flag to run. This together with the shared storage paradigm allows large groups of distributed services all sharing the same underlying data to be created.

docker image

This shows what images we have locally (contrast this with docker ps for running and suspended instances) . This consists of both images that have been downloaded from the Hub and those we have made ourselves by applying a Dockerfile to the base of an existing image.

docker rmi

Rmi stands for remove image. Eventually you could end up with lots and lots of images you no longer have any use for. The rmi command will remove any image that is not referenced by another. For the above case if we want to docker rmi alpine we would need to add a -f (force) flag because we have another image dependent on it:

docker rmi -f alpine
Untagged: alpine:latest
Untagged: alpine@sha256:dfbd4a3a8ebca874ebd2474f044a0b33600d4523d03b0df76e5c5986cb02d7e8
Deleted: sha256:88e169ea8f46ff0d0df784b1b254a15ecfaf045aee1856dca1ec242fdd231ddd

If  dependencies are more complex it is not possible to use the -f Рsome scripting will be needed to find all the dependant images and remove them first.

docker build

If you want to make custom changes to an image create a directory with a Dockerfile text file plus any custom content you wish to go into an image (e.g. content for a web server). You can give the resulting image any name you like that is not already in use by another image. With the Dockerfile and command line options you can control aspects such as what network ports are accessible within the image, how much memory and CPU time it is permitted to use, and what parts of the host filesystem are exposed to it. Volume mappings as explained above in docker run can alos be specified in the Dockerfile. When the built images are run with docker run it is common for ports in the image to be remapped so that they can be visible from the outside world. For example you can have several web services configured within the container to operate on port 80 but from the outside world accessible on ports 8081, 8082 etc. Whole sets of micro-servers can be orchestrated using Docker Compose in a similar manner to how Amazon Cloud Formation or OpenStack HEAT templates work.

docker login

If you have signed up for a hub account you can use those credentials to log into it and then docker push mages you have created with docker build to it. This allows containerised infrastructure to be developed and then stored centrally for deployment.

This is not an exhaustive set of what Docker can do, what I have tried to do is give a useful summary and pointers for further study.

 

Some musings on minimal program size and startup overhead

I have been playing a little with Google’s Go language – the report that they are working a way of translating CPython code into Go for better ability to run in massively parallel systems got my interest. A project called Grumpy.

I have never used Go so intrigued by the introductory video available at the Go site I installed it on my Centos 7 Linux machine.

Here is that ubiquitous “Hello World” program in Go:

package main

import "fmt"

func main() {
    fmt.Printf("hello, world\n")
}

Unlike languages like Python and Perl, Go is very fussy where your code is before it will compile it for you.

In order to use the go compiler (simply) you need to set up a shell variable called GOPATH to where you are working. You also need to create a src directory under this then a directory for your project that finally contains the above code in a hello.go file

go install hello

Will if there are no errors quickly create a bin/hello program under the same directory specified by $GOPATH.

NOTE! You will need to be in the directory you have defined as GOPATH for this to work. Not the src sub directory or the directory containing the hello.go file itself. This took a little getting used to.

This blog is not about how to program with Go however. What interested me to take time out from my own learning of Go to write this is the comparative size and execution speeds of different languages.

Google’s motivation to undertake the work of running existing Python code is one of speed. Many key Google sites such as YouTube rely heavily on Python code so a faster and more scale-able way to run it (free from the Global Interpreter Lock) is obviously of great value to them.

The super simple Go program weighs in at 1.6M in size! When stripped it goes down to just under 1MB. I have been programming long enough to have considered this wildly extravagant in my youth ūüėČ

The equivalent absolutely minimal C program:

main() {
    puts("hello world\n");
}

when similarly stripped weighs in at just 6240 bytes. This figure may differ slightly depending on compiler version and options but it will always be a LOT less than 1MB.

OK what gives? The first thing I tested was if this program is genuinely compiled or has any outside dependencies. I did this by temporarily renaming the whole tree that the Go compiler had been installed into. The compiled program still ran without any issues – proving it is a true compiler. The complied program has everything it needs to function within that 1MB code budget. It could be deployed to another computer and run easily – a true compiler.

Is there any speed penalty for carrying around this code?

Well the super simple C version executes in 0.002 seconds – the resolution of the Linux time command only goes down to 0.001 second so this 0.002 seconds represents the absolute overhead for loading a program (any program) from a standing start.

The Go version? Only an extra thousandth of a second! That is not bad considering the program is so much larger. In terms of Linux system calls invoked (which can be found with the strace program) the Go version needs to make 146 syscalls compared to just 30 from the minimal C version.

How do other languages do?

Perl has long been used for quick fix scripts and it’s “hello world” is pretty impressive too!

time perl -e 'print "Hello World\n"'
Hello World

real 0m0.006s
user 0m0.002s
sys 0m0.004s

So only twice as slow as the compiled Go version – but only 30 bytes long

234 syscalls were required which goes some of the way to explaining why it is slower.

Python?

time python -c 'print "Hello World\n"'
Hello World


real 0m0.070s
user 0m0.063s
sys 0m0.007s

Considerably slower! Lets look at the number of syscalls needed:

strace python -c 'print "Hello World\n"' 2>&1 | wc -l
 1144

Well that’s more than 4 times as many syscalls which explains why things take longer.

These numbers will vary depending on just how much each work each language system has to do on startup. It may be that much more code is available to the Python compared to the Perl on my machine. Search paths that have to be initialised as the program starts to run.

Lastly Ruby.

Ruby is a pure interpreter. This gives the flexibility to write Ruby code on the fly within the program as it executes. This power in one way has a price to pay in pure speed however:

time ruby -e 'puts "Hello World\n"'
Hello World

real	0m0.100s
user	0m0.085s
sys	0m0.015s

A whole tenth of a second. The number of syscalls used (1106) is actually lower than for Python so the slowness cannot be blamed on latency in the OS.

Conclusion

Which language is best for writing “Hello World” ūüėČ

The Ruby version is some 500 times slower than the pure C version. But still.. a 10th of a second is fast enough! In any of these languages a program would have to get considerably more complex (such as doing lots and lots of looping) before the run time would go up much. There is a relatively large penalty for starting up a new process in the first place. This is why time critical tasks inside computer systems are served by pools of ready loaded and waiting for action processes.

Go seems a worth-while language to learn. The per compiled program 1MB space overhead may be an issue in some very tight on space environments but in general purpose computers 1MB is nothing now (but remember this was more than the entire address space of MS-DOS & the original IBM PC – how times have changed!)

arepingable

This is a handy bit of Perl code I wrote years ago

It will quickly and efficiently tell you which of a group of machines you specify are ‘up’ – in the case of this code it is that they are listening on the port for ssh but it is trivial to change the code to look for something else e.g. http server on port 80.

The trick to the speed is that all machines are pinged at once and only then are the answers collected.

The other nice feature is that it will expand ranges in the form name[01..10] will ping machines called name01, name02 etc. This part can be adapted to suit whatever naming convention you have on your site. As we are using the Time::HiRes module we get detailed stats about how long it takes to resolve the name of each machine.

#!/usr/bin/perl 
# general machine existance test. Martin Houston. mhoust42@gmail.com
# checks that ssh service is running on a set of machines
# could be adapted to test another port e.g. 80 for http.
use Net::Ping;
use Data::Dumper;
use Time::HiRes qw( usleep 
 ualarm gettimeofday tv_interval nanosleep
 clock_gettime clock_getres clock_nanosleep clock stat );

# Take the names of the systems to ping on the command line
my (@systems) = @ARGV;
# or piped as a list
@systems = (<STDIN>) unless scalar(@systems) > 0;
# remove newlines
chomp for @systems;

@systems = expand_ranges(@systems);

my %trial = ();
my %lookuptimes = ();

# first we get how long it takes to look each system up 

for(@systems) {
    my $before = [gettimeofday];
    gethostbyname($_);
    $lookuptimes{$_} = tv_interval ($before);
}
# allow 2 sec response
my $p = Net::Ping->new('syn',2);
$p->hires();
# we check they are listening on the ssh port
$p->{port_num} = getservbyname('ssh', 'tcp');
# just send syn packets
map { $p->ping($_) } (@systems);
# collect all acks that arrive within 2 secs
while(my($host,$rtt,$ip) = $p->ack) {
    $trial{$host} = "'$ip' : $lookuptimes{$host} : $rtt";
}
# these machines have a ssh (port 22) service running
my @live = grep { defined $trial{$_} } @systems;
print join("\n", map { "$trial{$_} : '$_'," } @live), "\n";

# This works with computer names with numbers
# name[1..10] will expand to name01 name02 name03...
sub expand_ranges {
    my @res = ();
    for my $sys (@_) {
        if($sys =~ /\[/) {
            my $pre = $`;
            my $pat = '[' . $';
            # get Perl to do the hard work on pattern expansion
            my ($numbers) = eval($pat);
            for(@{$numbers}) {
                # 01 not 1, change to suit your conventions
                $_ = '0' . $_ if $_ < 10;
                push @res, "$pre$_";
            }
       }
       else {
           # not special
           push @res, $sys;
       }
   }
   return @res;
}

Please take and adapt this to what suits your site. A handy adaptation is to make a notpingable script that instead returns just the list of systems that are not responding. This, combined with testing various ports can be the basis of a very lightweight system monitor.

Python function decorators

I am refreshing my knowledge about Docker at the moment (may be the subject of another post in the near future) and one of the samples in the Docker course I was doing deployed, as part of a larger collection of services, a simple web app using the Python Flask micro-framework. I have done quite a bit of Python but the past but had not yet come across a construct like Flask uses (or rather I have seen them before in programs but never had the need to pry):

@app.route("/", methods=['POST','GET'])
def somefunc():
    return "Stuff to go to the web page"

Turns out the @ sign turns out to be something called a function decorator. In this case it takes a function that just spits out text for a web page and sets it up bound to a specific page on the site (in this case the root) for the Flask framework to serve on request. This is a neat abstraction as it means the page can be designed but what the url leading to that page can be easily changed in one place.

Function decorators is one of those concepts that seems rather daunting at first (a function returning a function returning a function HELP!) but is actually not that bad once you grasp what it is trying to do.

Think of a pipeline. A function decorator is something that can take the output of another function and change it in some way – such as upper casing any text or adding top & tail tag pairs such as are used in HTML or XML for example.

Of course you could do this the long way round by providing the output of one function as an argument to the function doing further processing, but this could get a bit messy and long-winded.

The @ sign notation is ‘syntactic sugar’ that allows such pipelines to be built up.

Decorator functions have to be defined to return a function that in turn does the actual work. This is needed as the decorator is bound to the function when it is defined, not every time it is used. Being able to do this although ‘syntactic sugar’ saves a lot of complex and potentially confusing coding if that function is called a lot.

There is a helper function from functools called wraps which is itself a decorator that aids your debugging efforts by normalising function names in stack back traces. I have added it in this example for that purpose. If the code does not need debugging you can leave it out.

from functools import wraps
def tags(tagn,opts = None):
    ''' decorate an output producing function with a tag '''
    def tags_decorator(func):
        # optionally itself decorated
        @wraps(func)
        def func_wrapper(name):
            if opts is None:
                return "<{0}>{1}</{0}>".format(tagn, func(name))
            else:
                return "<{0} {2}>{1}</{0}>".format(tagn, func(name), opts)
       # back in tags_decorator
       return func_wrapper
   # back in tags return the function - NOT the result of calling it
   return tags_decorator
   # here we return a function(tags) that itself returns
   # a function(tags_decorator) that returns a function(func_wrapper)!

# now we use it
@tags("h1")
def h1(text)
    # note all the work done in the decorator in this case
    return text

@tags("p",'align="center"')
def p(text)
   # note all the work done in the decorator in this case
   return text

print h1("This is a title")+p("And some centred text")

Note that the decorated function definition spans multiple lines.

It is not valid Python syntax to say @decorator() def decorated(): Remember in Python the choice and amount of white-space is significant. Cutting & pasting example code from websites may not always work because of this – check the logic of what you end up with. Another important tip on the syntax is that comments are permitted between the decorator and decorated function. This means that you could have several decorators and just un-comment the ones you want to use – easy trial of alternative presentation styles perhaps?

In conclusion function decorators allow processing to be split up pipeline fashion to keep logic in neat compartments. The above example is very simple but real uses could be something like substituting real values for placeholders that the inner function deals with. This makes it a good tool for imposing local configuration that the inner core of logic does not need to know about.

Linux 20 years ago – RedHat 4.1 on the PCW magazine coverdisk

20 years ago virtually nobody had fast internet. If you were outside of Government, big Corporations or Academia your Internet experience was most likely just a slow dial-up modem. This was great for exploring the early WWW or sending emails but asking someone to download enough software to leave the confines of DOS/Windows and start to explore Linux was daunting.

As explained in my other article vital to the growth of Linux back then were the itself relatively new technology of computer magazines coming with first CD-ROMS and then DVDs attached to their cover. You still see cover DVDs on many magazines with with widespread high speed broadband it is more convenience that necessity nowadays.

Continue reading “Linux 20 years ago – RedHat 4.1 on the PCW magazine coverdisk”

A problem with solid state storage (and a suggested solution)

This article also appears at https://www.linkedin.com/pulse/problem-solid-state-storage-suggested-solution-martin-houston so if you are on linked-in you can choose to comment there instead..

There are several computers inside an Ocado Warehouse robot, each controlling some aspect of the operation. The single ARM CPU Linux based computer within each robot I was working on had two jobs, one to manage the in situ upgrading and if needed downgrading (change back-out) for the firmware on the other control computers. This was only occasionally used so the relatively meagre performance of the hardware was not an issue. The other role was considerably more demanding. That of collecting all the low level debug information logs from the rest of the robot and delivering them for analysis when uploaded every time the robot docked for battery charging. While working out on the warehouse grid the robots were controlled by WiFi signals but there was not enough bandwidth available to send back a fire-hose of detailed status & debug information in real time. While the robots were under active development the volume of debug data grew and grew and the only place to put it until it could be uploaded was the micro SD card inside the control computer board. This micro SD card had a maximum write speed of about 10MB/s. Having spent several months looking after hundreds of computers behaving in this way I have a new respect for the reliability and longevity of standard consumer grade microSD cards but nothing was going to change the fact that they were too slow for the job of handling a potentially limitless appetite for debug information from developers trying to nail illusive problems.

Writing the logs to memory was much, much faster than to SD card but the control computer had only been specified with 512MB of RAM, as it was never envisaged that such a large volume of data would need to be collected during robot development. I did some research and found that with the fallocate system call it is also possible to punch holes in a file at the beginning as well as the usual file truncation action of freeing blocks at the end. What you are left with is a ‘sparse’ file where sections that have nothing stored on disk read as full of nulls. I found that if you truncate the beginning of a log file the processes writing that file simply do not care. It is still possible to append and also to ‘tail’ the file to read back the most recent contents. The file can grow and grow to whatever the size limit is of the underlying file-system but only occupy a small amount of actual space. This discovery allowed me to collect logs into ram-disk instead of direct to the slow SD card. I used inotify system calls to watch a whole tree of log files being written alerting a process which collected all but the last few KB of each file and produced a compressed multiplexed version of all of them. The actions of compressing and multiplexing increased the effective write rate of the SD card enough to cope with much higher rates of logging activity, in effect kicking the can far enough down the road that the developers could have what logging they liked. The way SD cards work is that if 4MB of data is all written at once in a single file stream the write is much more efficient as that is the size of the erase block in the SD technology. I thought I had a prefect solution! However when fully stress tested I found it was missing one component, and one that would be non trivial to write.

I was able to run tests with simulated log data writing into the ram-disk that eventually overran the ability of the inotify driven background process to catch up. It took many minutes but slowly the ram-disk would fill up completely forcing the writing of log-file data to fail ‚Äď implying missing, possibly vital, log information. What would be nice, I thought, was if the system calls writing data to a file-system that was in danger of getting full could be slowed in some way, just by a few microseconds, that would give the background process time to catch up.

An old mechanical hard disk would in a way do this. The furious search for free blocks would increase IO wait time so that writing blocks to a nearly full file-system would indeed take longer. However regressing back to mechanical disks is no solution as the forced head movements would also hamper processes that would be reading and consuming the data too!

What the Linux kernel needs is some way to simulate the slowing effect of writes that a nearly full file-system has on processes wanting to make the situation worse, and in a gradually increasing severity, but no corresponding penalty to readers (and removers) of data. I knew this would solve my immediate problem and then realised that it would have highly beneficial effects for data storage in the Enterprise too. Having file-systems and filers which exhibited this behaviour would get an early warning on file systems filling up, but most importantly a way to delay the inevitable full file-system crisis. With fancy enough monitoring it would be possible to isolate the issue to a single ‚Äúway too chatty‚ÄĚ application. The rate of log writing just for that process could be slowed so that the team responsible would have time to sort out what is going wrong. The fallocate trick I had found for dealing with the robot logs would also come in handy here – if a log has been discovered that has been growing for months or even years (a failure to implement log rotation), then a fallocate punched hole could be used to archive or just dispose off all data too old to be interesting without having to disrupt the running process at all.

Even if the rate for a single process had to be slowed to effectively a full stop it is definitely ‚Äúthe lesser of two evils‚ÄĚ than allowing the file-system it is writing to to fill up. That would more likely have collateral damage to other well behaved parts of the infrastructure that were using that portion of storage space responsibly. The normal panic mode thing that system admins have to do in such a situation, on filers which have that luxury, is to give that file-system more (expensive) storage. This is a costly way to “fix” the problem and it does nothing to address the reasons why that file-system got full in the first place.

This was several months ago now and at the time I did a search so see if any such feature already exists in the kernel but drew a blank. As I had seen enough email circulars of how Ocado was keen on maximising their IP I put forward a proposal to do this seemingly missing part of Linux kernel capability as a project. My request was turned down (even though it was the missing piece needed to solve the log collecting issue with the robots). I was told I was welcome to do the work ‚Äúin my own time‚ÄĚ. Now I no longer work there here is my chance to ask people 1. does this technology already exist or 2. anyone fancy giving me a hand in writing it?