Some musings on minimal program size and startup overhead

I have been playing a little with Google’s Go language – the report that they are working a way of translating CPython code into Go for better ability to run in massively parallel systems got my interest. A project called Grumpy.

I have never used Go so intrigued by the introductory video available at the Go site I installed it on my Centos 7 Linux machine.

Here is that ubiquitous “Hello World” program in Go:

package main

import "fmt"

func main() {
    fmt.Printf("hello, world\n")

Unlike languages like Python and Perl, Go is very fussy where your code is before it will compile it for you.

In order to use the go compiler (simply) you need to set up a shell variable called GOPATH to where you are working. You also need to create a src directory under this then a directory for your project that finally contains the above code in a hello.go file

go install hello

Will if there are no errors quickly create a bin/hello program under the same directory specified by $GOPATH.

NOTE! You will need to be in the directory you have defined as GOPATH for this to work. Not the src sub directory or the directory containing the hello.go file itself. This took a little getting used to.

This blog is not about how to program with Go however. What interested me to take time out from my own learning of Go to write this is the comparative size and execution speeds of different languages.

Google’s motivation to undertake the work of running existing Python code is one of speed. Many key Google sites such as YouTube rely heavily on Python code so a faster and more scale-able way to run it (free from the Global Interpreter Lock) is obviously of great value to them.

The super simple Go program weighs in at 1.6M in size! When stripped it goes down to just under 1MB. I have been programming long enough to have considered this wildly extravagant in my youth 😉

The equivalent absolutely minimal C program:

main() {
    puts("hello world\n");

when similarly stripped weighs in at just 6240 bytes. This figure may differ slightly depending on compiler version and options but it will always be a LOT less than 1MB.

OK what gives? The first thing I tested was if this program is genuinely compiled or has any outside dependencies. I did this by temporarily renaming the whole tree that the Go compiler had been installed into. The compiled program still ran without any issues – proving it is a true compiler. The complied program has everything it needs to function within that 1MB code budget. It could be deployed to another computer and run easily – a true compiler.

Is there any speed penalty for carrying around this code?

Well the super simple C version executes in 0.002 seconds – the resolution of the Linux time command only goes down to 0.001 second so this 0.002 seconds represents the absolute overhead for loading a program (any program) from a standing start.

The Go version? Only an extra thousandth of a second! That is not bad considering the program is so much larger. In terms of Linux system calls invoked (which can be found with the strace program) the Go version needs to make 146 syscalls compared to just 30 from the minimal C version.

How do other languages do?

Perl has long been used for quick fix scripts and it’s “hello world” is pretty impressive too!

time perl -e 'print "Hello World\n"'
Hello World

real 0m0.006s
user 0m0.002s
sys 0m0.004s

So only twice as slow as the compiled Go version – but only 30 bytes long

234 syscalls were required which goes some of the way to explaining why it is slower.


time python -c 'print "Hello World\n"'
Hello World

real 0m0.070s
user 0m0.063s
sys 0m0.007s

Considerably slower! Lets look at the number of syscalls needed:

strace python -c 'print "Hello World\n"' 2>&1 | wc -l

Well that’s more than 4 times as many syscalls which explains why things take longer.

These numbers will vary depending on just how much each work each language system has to do on startup. It may be that much more code is available to the Python compared to the Perl on my machine. Search paths that have to be initialised as the program starts to run.

Lastly Ruby.

Ruby is a pure interpreter. This gives the flexibility to write Ruby code on the fly within the program as it executes. This power in one way has a price to pay in pure speed however:

time ruby -e 'puts "Hello World\n"'
Hello World

real	0m0.100s
user	0m0.085s
sys	0m0.015s

A whole tenth of a second. The number of syscalls used (1106) is actually lower than for Python so the slowness cannot be blamed on latency in the OS.


Which language is best for writing “Hello World” 😉

The Ruby version is some 500 times slower than the pure C version. But still.. a 10th of a second is fast enough! In any of these languages a program would have to get considerably more complex (such as doing lots and lots of looping) before the run time would go up much. There is a relatively large penalty for starting up a new process in the first place. This is why time critical tasks inside computer systems are served by pools of ready loaded and waiting for action processes.

Go seems a worth-while language to learn. The per compiled program 1MB space overhead may be an issue in some very tight on space environments but in general purpose computers 1MB is nothing now (but remember this was more than the entire address space of MS-DOS & the original IBM PC – how times have changed!)


This is a handy bit of Perl code I wrote years ago

It will quickly and efficiently tell you which of a group of machines you specify are ‘up’ – in the case of this code it is that they are listening on the port for ssh but it is trivial to change the code to look for something else e.g. http server on port 80.

The trick to the speed is that all machines are pinged at once and only then are the answers collected.

The other nice feature is that it will expand ranges in the form name[01..10] will ping machines called name01, name02 etc. This part can be adapted to suit whatever naming convention you have on your site. As we are using the Time::HiRes module we get detailed stats about how long it takes to resolve the name of each machine.

# general machine existance test. Martin Houston.
# checks that ssh service is running on a set of machines
# could be adapted to test another port e.g. 80 for http.
use Net::Ping;
use Data::Dumper;
use Time::HiRes qw( usleep 
 ualarm gettimeofday tv_interval nanosleep
 clock_gettime clock_getres clock_nanosleep clock stat );

# Take the names of the systems to ping on the command line
my (@systems) = @ARGV;
# or piped as a list
@systems = (<STDIN>) unless scalar(@systems) > 0;
# remove newlines
chomp for @systems;

@systems = expand_ranges(@systems);

my %trial = ();
my %lookuptimes = ();

# first we get how long it takes to look each system up 

for(@systems) {
    my $before = [gettimeofday];
    $lookuptimes{$_} = tv_interval ($before);
# allow 2 sec response
my $p = Net::Ping->new('syn',2);
# we check they are listening on the ssh port
$p->{port_num} = getservbyname('ssh', 'tcp');
# just send syn packets
map { $p->ping($_) } (@systems);
# collect all acks that arrive within 2 secs
while(my($host,$rtt,$ip) = $p->ack) {
    $trial{$host} = "'$ip' : $lookuptimes{$host} : $rtt";
# these machines have a ssh (port 22) service running
my @live = grep { defined $trial{$_} } @systems;
print join("\n", map { "$trial{$_} : '$_'," } @live), "\n";

# This works with computer names with numbers
# name[1..10] will expand to name01 name02 name03...
sub expand_ranges {
    my @res = ();
    for my $sys (@_) {
        if($sys =~ /\[/) {
            my $pre = $`;
            my $pat = '[' . $';
            # get Perl to do the hard work on pattern expansion
            my ($numbers) = eval($pat);
            for(@{$numbers}) {
                # 01 not 1, change to suit your conventions
                $_ = '0' . $_ if $_ < 10;
                push @res, "$pre$_";
       else {
           # not special
           push @res, $sys;
   return @res;

Please take and adapt this to what suits your site. A handy adaptation is to make a notpingable script that instead returns just the list of systems that are not responding. This, combined with testing various ports can be the basis of a very lightweight system monitor.

Python function decorators

I am refreshing my knowledge about Docker at the moment (may be the subject of another post in the near future) and one of the samples in the Docker course I was doing deployed, as part of a larger collection of services, a simple web app using the Python Flask micro-framework. I have done quite a bit of Python but the past but had not yet come across a construct like Flask uses (or rather I have seen them before in programs but never had the need to pry):

@app.route("/", methods=['POST','GET'])
def somefunc():
    return "Stuff to go to the web page"

Turns out the @ sign turns out to be something called a function decorator. In this case it takes a function that just spits out text for a web page and sets it up bound to a specific page on the site (in this case the root) for the Flask framework to serve on request. This is a neat abstraction as it means the page can be designed but what the url leading to that page can be easily changed in one place.

Function decorators is one of those concepts that seems rather daunting at first (a function returning a function returning a function HELP!) but is actually not that bad once you grasp what it is trying to do.

Think of a pipeline. A function decorator is something that can take the output of another function and change it in some way – such as upper casing any text or adding top & tail tag pairs such as are used in HTML or XML for example.

Of course you could do this the long way round by providing the output of one function as an argument to the function doing further processing, but this could get a bit messy and long-winded.

The @ sign notation is ‘syntactic sugar’ that allows such pipelines to be built up.

Decorator functions have to be defined to return a function that in turn does the actual work. This is needed as the decorator is bound to the function when it is defined, not every time it is used. Being able to do this although ‘syntactic sugar’ saves a lot of complex and potentially confusing coding if that function is called a lot.

There is a helper function from functools called wraps which is itself a decorator that aids your debugging efforts by normalising function names in stack back traces. I have added it in this example for that purpose. If the code does not need debugging you can leave it out.

from functools import wraps
def tags(tagn,opts = None):
    ''' decorate an output producing function with a tag '''
    def tags_decorator(func):
        # optionally itself decorated
        def func_wrapper(name):
            if opts is None:
                return "<{0}>{1}</{0}>".format(tagn, func(name))
                return "<{0} {2}>{1}</{0}>".format(tagn, func(name), opts)
       # back in tags_decorator
       return func_wrapper
   # back in tags return the function - NOT the result of calling it
   return tags_decorator
   # here we return a function(tags) that itself returns
   # a function(tags_decorator) that returns a function(func_wrapper)!

# now we use it
def h1(text)
    # note all the work done in the decorator in this case
    return text

def p(text)
   # note all the work done in the decorator in this case
   return text

print h1("This is a title")+p("And some centred text")

Note that the decorated function definition spans multiple lines.

It is not valid Python syntax to say @decorator() def decorated(): Remember in Python the choice and amount of white-space is significant. Cutting & pasting example code from websites may not always work because of this – check the logic of what you end up with. Another important tip on the syntax is that comments are permitted between the decorator and decorated function. This means that you could have several decorators and just un-comment the ones you want to use – easy trial of alternative presentation styles perhaps?

In conclusion function decorators allow processing to be split up pipeline fashion to keep logic in neat compartments. The above example is very simple but real uses could be something like substituting real values for placeholders that the inner function deals with. This makes it a good tool for imposing local configuration that the inner core of logic does not need to know about.

Puppet commands mini crib

Just a list of useful Puppet commands and key concepts

Not intended to really be a tutorial… If you find it useful I am glad, but I am learning Puppet myself and I always find the best way to really understand something is finding a really simple way to explain it. This is my attempt to distil Puppet down to the bare essentials that need to be grasped. It is a work in progress.

puppet help

Come on this one should be obvious!
The puppet help command on its own gives a list of all available sub commands. Including one of them after the word help gives detailed help on how to use it – use that to expand on what you learn from this crib sheet! The puppet man command gives a full manual page about the command – an expansion on the summary that puppet help gives.

puppet describe –list

This gives a list of all the different sorts of things that the puppet you have installed and configured knows how to control – i.e. that puppet code has been written to manage. This of course will vary with your Puppet set-up. If you are trying to apply a manifest containing resources your Puppet has no clue how to manage you are not going to get very far are you? You may want to pipe this through your favourite pager as the list may be long!

puppet describe <something from the list>

Gives the equivalent  of a “manual page” for that particular “thing” – what options are available for controlling it. This is a definition of what should be put in the manifest for that sort of resource. For example here are the options for “mount”

Manages mounted filesystems, including putting mount
information into the mount table. The actual behavior depends
on the value of the 'ensure' parameter.
**Refresh:** `mount` resources can respond to refresh events (via
`notify`, `subscribe`, or the `~>` arrow). If a `mount` receives an event
from another resource **and** its `ensure` attribute is set to `mounted`,
Puppet will try to unmount then remount that filesystem.
**Autorequires:** If Puppet is managing any parents of a mount resource ---
that is, other mount points higher up in the filesystem --- the child
mount will autorequire them.


- **atboot**
    Whether to mount the mount at boot.  Not all platforms
support this.

- **blockdevice**
    The device to fsck.  This is property is only valid
    on Solaris, and in most cases will default to the correct

- **device**
    The device providing the mount.  This can be whatever
    device is supporting by the mount, including network
    devices or devices specified by UUID rather than device
    path, depending on the operating system.

- **dump**
    Whether to dump the mount.  Not all platform support this.
    Valid values are `1` or `0` (or `2` on FreeBSD). Default is `0`.
    Values can match `/(0|1)/`.

- **ensure**
    Control what to do with this mount. Set this attribute to
    `unmounted` to make sure the filesystem is in the filesystem table
    but not mounted (if the filesystem is currently mounted, it will be
    unmounted).  Set it to `absent` to unmount (if necessary) and remove
    the filesystem from the fstab.  Set to `mounted` to add it to the
    fstab and mount it. Set to `present` to add to fstab but not change
    mount/unmount status.
    Valid values are `defined` (also called `present`), `unmounted`,
    `absent`, `mounted`. 

- **fstype**
    The mount type.  Valid values depend on the
    operating system.  This is a required option.

- **name**
    The mount path for the mount.

- **options**
    Mount options for the mounts, comma-separated as they would appear
    in the fstab on Linux. AIX options other than dev, nodename, or vfs may
    be defined here. If specified, AIX options of account, boot, check,
    mount, size, type, vol, log, and quota must be alphabetically sorted at
    the end of the list.

- **pass**
    The pass in which the mount is checked.

- **remounts**
    Whether the mount can be remounted  `mount -o remount`.  If
    this is false, then the filesystem will be unmounted and remounted
    manually, which is prone to failure.
Valid values are `true`, `false`. 

- **target**
    The file in which to store the mount table.  Only used by
    those providers that write to disk.


Note that the puppet man command gives a very similar detailed output for puppet commands themselves.

puppet resource

This is a way to “hand crank” Puppet – to get Puppet to set the options for a resource the same way as if a manifest was to be applied. As an extra bonus option you get emitted a manifest entry that would result in the same puppet resource (Note as we are doing something to change the system  sudo is needed):

sudo puppet resource user bogus ensure=present

Notice: /User[bogus]/ensure: created
user { 'bogus':
  ensure => 'present',

And we can see that the user has indeed been created:

grep bogus /etc/passwd

And remove it again:

sudo puppet resource user bogus ensure=absent
Notice: /User[bogus]/ensure: removed
user { 'bogus':
  ensure => 'absent',

And the user is no longer in the /etc/passwd file.

Note if you just want the emitting manifest syntax side effect use the –noop flag  at the end of the command – so that actions are not actually done. The resource command still has to be one puppet would be able to do (correct parameters for the resource type etc).

Summary is that puppet resource is an opportunity to get puppet to do individual actions. This is both a handy way to do low level code testing and gives syntactically correct code to paste into manifests for automated repetition. Puppet manifest syntax (Puppet DSL) takes a bit of getting used to as it is rather richer in meaning than it looks at first glance to be.

Without the option=value parts puppet resource can be used to query the current state of managed resources, again emitting the result as syntax suitable for a manifest file (Note sudo still required – this data may well be sensitive!):

 sudo puppet resource user root

user { 'root':
  ensure           => 'present',
  comment          => 'root',
  gid              => '0',
  home             => '/root',
  password         => '$6$Fzx4UD.89YeLcYry$ph.N6w1t.dSzmn/dycQ0sGTGL/gGsgI8JTo94nwqapffTbruqDhrSgKE.G132RmJ.q03lT.MEMLDahG7XH.',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '0',

All the above options could be supplied to recreate a root account as it exists. Many of them however are defaults – such as our ‘bogus’ user got a bash shell without having to explicitly specify it.

puppet apply

The blocks of Puppet DSL emitted by puppet resource, or new creations with the same syntax, can serve as input to the next stage up the food chain – puppet apply

Using the same example as before – create a bogus user:

sudo puppet apply --execute "user { 'bogus': ensure => 'present',}"
Notice: Compiled catalog for in environment production in 0.44 seconds
Notice: /Stage[main]/Main/User[bogus]/ensure: created
Notice: Finished catalog run in 1.82 seconds

Note that although the end effect is the same, the output messages from puppet apply are rather different and more like what you get to see on a ‘real’ puppet run. Note -e can be used as shorthand for –execute.

Using nothing more than puppet apply it is possible to develop new pieces of Puppet DSL and test them in situ before tested manifest files get installed on a puppet master.

If instead of being squashed onto 1 line for use with –execute, that DSL for creating the bogus user was simply saved in a file called bogus.pp (Puppet manifest files by convention end in .pp) then

puppet apply bogus.pp

would have the same effect of creating the user.

Now we have this action captured in a file it becomes easy to start adding parameters, like giving the user a password and other options appropriate to the user resource. Remember “puppet describe user” will give us full details on what options are possible for this resource.

puppet config print

This will print out the current config that puppet is using on this machine as a long list of name=value pairs. Changes to the config can be achieved with puppet config set name value changes this way end up in /etc/puppet/puppet.conf for the Open Source version and /etc/puppetlabs/puppet/puppet.conf for the Puppetlabs Enterprise version. These files can also be edited by hand for more complex config (with comments). A particular point of limitation with using puppet config set is that the variables only end up in the [main] section so could be overridden if thee has been a set variable in a more specific section for [agent].

sudo puppet config print confdir will show where the configuration data for your puppet is set (/etc/puppet for root in the Open Source version). Note that if you run this without the sudo you will get the answer $HOME/.puppet – this means that  you can work with and tests some aspects of Puppet without needing root, however it is a trap for the unwary as configuration will differ depending on if root is used or not!

Enabling a development environment

If $confdir is /etc/puppet then we need to set up some environments under there where Puppet will expect them to be.

sudo mkdir -p /etc/puppet/environments/development

sudo puppet config set environment development

This is saying that the ‘bootstrap’ or ‘root’ for Puppet configuration to be applied to this machine while it is in ‘development’ status starts with a site.pp file located under the manifests directory here. This structure becomes particularly important when a machine becomes  a puppet master, storing the configuration for many different machines that are configured to take their setup from different states. Some machines may be in ‘development’ state, others in ‘preprod’ or ‘production’. In big Puppet deployments a version control system like git and a formal change control procedure are used to make sure that Puppet code does not reach critical production machines unless it has been properly tested with less critical ones.

puppet module generate

In order to tailor puppet to our system in non trivial ways we will need to generate one or more modules:

puppet module generate yournamemodname –skip-interview

This generates the whole expected boilerplate structure for a module in the current directory under a directory called yourname-modname. Your first action should be to rename that directory to just modname as we are just using self created modules to describe the setup of this site at a high level. The yourname bit is just to give some namespace separation if you later decide to share your modules, or are making modules to be shared, so in this case is simply not needed, the top level config is really going to be what is unique about your site.

The two really critical files here are manifests/init.pp and tests/init.pp. The former defines the meat of the module, you will probably want several .pp files but init.pp is the entry point (in the same way that site.pp is the entry point for an entire puppet run) and the latter exercises it for test purposes. Modules have to be invoked from somewhere in order to do anything – the tests section gives an opportunity for unit testing of the Puppet code being written.

Here is a simple example of this:

Go to the directory you have defined as development (e.g. etc/puppet/environments/development)

Create modules directory and change to it.

puppet module generate nevermind-donothing –skip-interview

mv nevermind-donothing donothing

Now edit donothing/init.pp and add this to the empty donothing class:

notify { ‘Applying class donothing’:

puppet apply donothing/tests/init.pp # note we have not edited this – its default behaviour of invoking the class is enough for this.
Notice: Compiled catalog for in environment development in 0.11 seconds
Notice: Applying class donothing
Notice: /Stage[main]/Donothing/Notify[Applying class donothing]/message: defined ‘message’ as ‘Applying class donothing’
Notice: Finished catalog run in 1.76 seconds

Because we are working in the place puppet expects things to be we do not need a –modulepath=./ that would be required if you want to develop module code in arbitrary other places

puppet module search & install

The other sort of modules are ones that are installed from the puppet forge. These are just tared up archive trees the same as created originally by module generate.  One of the things that makes Puppet so powerful is that when sysadmins solve a problem about how to orchestrate something with Puppet the most often contribute the code back for others to use and improve. This is the whole essence of why the Open Source idea is so powerful.

puppet module search for a named module confirms that it exists and offers possible alternatives.

puppet module search puppetlabs-apache
Notice: Searching ...
puppetlabs-apache Installs, configur... @puppetlabs web ssl 
dploeger-pagespeed Installs the Apach... @dploeger 
mayflower-php Generic PHP module... @mayflower php fpm 
maestrodev-rvm A puppet module fo... @maestrodev ruby rvm 
jgazeley-django Deploy a Django ap... @jgazeley 
hetzner-roundcube Roundcube webmail ... @hetzner webmail 
puppet-puppetboard Install and config... @puppet puppetdb 
saw-reviewboard Install and contro... @saw trac 
jhoblitt-awstats Manages the AWStat... @jhoblitt 
nibalizer-puppetboard Install and config... @nibalizer redhat 
spotify-puppetexplorer Manage the Puppet ... @spotify puppetdb 
pulp-pulp This module can be... @pulp 
pltraining-dockeragent Manages Puppet Lab... @pltraining 
landcareresearch-amazon_s3 Manages mounting S... @landcareresearch 
thejandroman-grafana Installs and confi... @thejandroman 
thejandroman-kibana3 Installs and confi... @thejandroman kibana 
gajdaw-symfony Puppet module to i... @gajdaw php app 
jgazeley-mod_auth_cas Configure mod_auth... @jgazeley httpd cas 
42ways-railsapp Basic Rails Server... @42ways rails ruby
jgazeley-speedtest Install Ookla Spee... @jgazeley

puppet module install actually installs it e.g.

puppet module install puppetlabs-apache
Notice: Preparing to install into /etc/puppet/environments/development/modules ...
Notice: Downloading from ...
Notice: Installing -- do not interrupt ...
└─┬ puppetlabs-apache (v1.11.0)
  ├── puppetlabs-concat (v2.2.0)
  └── puppetlabs-stdlib (v4.14.0)

Note as with creating your own modules a –modulepath= option will allow you to install modules somewhere other than the standard place Puppet is going to be looking (under the development branch in our case). Note that there is dependency following going on here, just like with Linux packages. As well as the puppetlabs-apache we wanted we also get two modules it in turn needs.

These modules also get installed under


so you can see the rest of how modules need to be constructed by looking at other people’s examples. Lower level modules will tend to have actual Ruby code as well as Puppet DSL declarations.

Fun with facts

The facter command without parameters will list all the custom facts available to Puppet on this system. If names of facts are listed on the command line only those as name => value pairs will be retuned. If an unknown fact is asked for like ‘wibble’ it gets mapped to nil.

facter is_virtual operatingsystem wibble
is_virtual => true
operatingsystem => CentOS
wibble => nil

It is possible to define custom facts so that anything that Puppet needs to be able to make decisions about configuration of systems.

There are 2 places facts are used within the Puppet DSL. Firstly within Templates – the .erb files. A fact can be embedded using this syntax:

 <%= @operatingsystem %>

Actually facts are not really local variables (this same syntax would be used for showing those in a template too) so a little bit more correct (and if you had the misfortune to have a local variable name conflicting with a fact:

<%= scope['::operatingsystem'] %>

Within Puppet DSL in the .pp files a $:: prefix is used (normal local variables just have a $ prefix within .pp files.

$OS = $::operatingsystem

Classes and defined types

We have already seen defined types as built into puppet as e.g. the user.


You can define your own types however with default values for any parameters that can then be overridden. Remember that donothing class we created earlier? Extend it like this:

class donothing {
 notify { 'Applying class donothing':
 bogus { 'suspect':
   flavour => 'suspect'

 bogus { 'plain': }

define bogus ($flavour='stinky') {
 notify { "Seen a $flavour bogus type": }

puppet apply -e "include donothing" 
Notice: Compiled catalog for in environment development in 0.08 seconds
Notice: Seen a stinky bogus type
Notice: /Stage[main]/Donothing/Bogus[plain]/Notify[Seen a stinky bogus type]/message: defined 'message' as 'Seen a stinky bogus type'
Notice: Seen a suspect bogus type
Notice: /Stage[main]/Donothing/Bogus[suspect]/Notify[Seen a suspect bogus type]/message: defined 'message' as 'Seen a suspect bogus type'
Notice: Applying class donothing
Notice: /Stage[main]/Donothing/Notify[Applying class donothing]/message: defined 'message' as 'Applying class donothing'
Notice: Finished catalog run in 1.82 seconds

Note a class does not have this ability – they are intended to be top level containers – a file with the same name as the class and ending in .pp contains the class.

To include such a file (found on the search path) within another class we just use include – like

include apache

or as we did above:

puppet apply -e "include donothing"

This assumes that the donothing class can be found on your search path.
Add –search-path=./ if need to set it for explicit test.

Note that if a defined type is specific to a particular module the convention is to prefix it with the module name and double colon e.g. apache::vhost

Classes can also have parameters – which allows some aspects of how that class is used to be changed when the class is invoked. The syntax is just a little different to defined types:

In a file withparams.pp:
class withparams($param_one='this',$param_two='other') {
# body does not matter...

# and on the command line
puppet apply -e  "class { 'withparams': param_one => 'that' }"

This will result in the ‘withparams’ class getting used but with the $param_one variable set to ‘that’ and $param_two with the default value of ‘other’ Note that we are being absolutely explicit here that a class is being called with non standard arguments. Calling a class without overriding any of the arguments is done with just ‘include’ as we have seen before. Remember puppet is NOT procedural. A class calling another class is just establishing a relationship of things that have to be done. The order they are done in needs to be separately controlled if indeed it is important. Remember apart from init.pp which is the special top level the class name must match the file name it is in as this is how classes are found when other classes request to include them.

puppet apply -e “class { ‘withparams’:}”

is exactly the same as

puppet apply -e “include withparams”

require/notify and the arrow notation -> ~>

Stating in the Puppet DSL that for specifying that other things have to be in place before the service will run (like a configuration file for example) or another resource can be created – this is done with the  require keyword.

This code:

file {'myconffile':
# commands to populate the file e.g. from a template
service {'myservice':
  require File['myconffile']
  # other stuff

Could instead be written as 
file {'myconffile':
# commands to populate the file e.g. from a template
} ->
service {'myservice':


There is an equivalent when you have the file but want to say what service is affected by it which is called  notify. The other way to state this is with a ~> arrow – with the tilde character. Having -> and ~> means that you can add the appropriate require or notify without having to change the DSL source of the other component, which is handy as it might be a contributed module written by someone else.

Note that the -> or ~> is used as a joiner between the two text blocks. For require in particular long chains can be implied that first thing needs to be done first then second then third.. e.g create a directory, then create a sub directory then create some files in it. Using explicit requires in this case would look messy.

Puppet master

Although puppet code can be developed and run on any host directly a complex system where servers interrelate to each others requires one machine to be set up as a puppet master. If the puppet master is down all that is lost is the ability for changes to be made to a deployed configuration. Therefor there is no special need for the puppet master to be designed as fault tolerant. Even production networks can tolerate short outages of the puppet master. What is VITAL however is that the puppet master is secure as gaining root on it is potentially compromising as root access to any of the machines being controlled by that master. This is because the master machine controls exactly what happens to the agent machines by deploying puppet code to them. This is why it is called Puppet.

Systems to be controlled register with the puppet master by using puppet agent mode. Communication is authenticated using SSL so each agent needs to issue a SSL certificate and have it signed by the master. Signing of certificates by the master can be automated but if security is important is best left as manual as new systems also have to have decisions made to them about which puppet environment (development or production) is controlling them and what role the have within that environment. This determines how that machine is to be configured by the puppet master e.g. as a web server.

For trivial testing and small deployments there is a simple, single threaded web server built directly into puppet so that the puppet master can be easily started and stopped and debugged etc.

puppet master --debug --verbose --no-daemonize --logdest console

This will run in the foreground and emit verbose output direct to the console (have a nice big scroll-able terminal to run it in). However for larger deployments using the puppet master in conjunction with a proper multi-threaded web server like Apache or there is a puppetserver package also available which comes from puppetlabs and is a puppetserver written in jruby (therefore running under the java jvm) for performance. This will usually be available as a package called ‘puppetserver’.

“bootstrapping” machines to use a puppet master

As stated earlier it is easy just to use puppet locally and invoke specific pieces of the Puppet DSM by hand using puppet apply to test them. However the whole point of puppet is to enable the control of complex setups that span over dozens to thousands of machines. In order for this to happen there needs to be a central place where all the machine configurations are decided and actions issued to get machine configurations into line with what they should be.

In order for a machine to be joined into the Puppet infrastructure these things need to happen.

  1. Enough of puppet code needs to be present on the machine to allow the puppet agent function to be performed. This would be the responsibility of initial system build from a technology like kickstart or cloning a basic system image when a new VM is created.
  2. That machine needs to know where to find the puppetmaster – this will either be an alias in whatever DNS domain the newly created machine has been given (possibly handed out by a DHCP server) or, more simply an entry in the local hosts file put there by whatever initial build process is in place.
  3. Command puppet config set certname with a FQDN of a name that you want this node to be know to the puppet master as. This name is vital as it must correspond to an entry in the manifests/site.pp file on the puppetmaster.
  4. Command puppet agent -t run to present this certificate to the puppet master for signing.
  5. Unless the puppet master has been set to auto-sign all certificate requests (unwise) the sysadm now must take action to sign the certificate.
  6. A further puppet agent command will start regular background communication with the master. Note that the master does not  itself need 100% availability. Loss of puppet connectifity only affects ability to change configuration, not actual running of the machine itself.
  7. Now communication is established it is possible to use the node classifier on the master to assign this node into whatever group is desired. The site.pp within that group then acts as a starting point for the Puppet configuration and therefore the role that machine will then take.


How to make your computer faster

My that is a super vague title isn’t it?

Everyone would like a faster computer would’t they? For some things, such as live broadcasting, a computer that is not quite fast enough can be next to useless, nobody enjoys watching badly lagging and stuttering video and audio. However a screamingly fast computer is not all advantage. They consume more power so over their lifetime cost more to run and tend to be less reliable being “bleeding edge”.  Also they are harder to keep cool so tend to generate more noise while in use. The very fastest chips that gaming enthusiasts use require water cooling to get the very most out of them, a hobby known as “Overclocking”. This eliminates some of the noise but can bring its own set of problems associated with the possibility of leakage as systems get older and get worn. Water and electricity do not mix very well.

Continue reading “How to make your computer faster”

Have you ever deleted things by accident?

With a modern desktop it is likely that the files are just moved into a ‘Trash’ area rather than gone for good – at least for a while.

However the thing about a ‘trash can’ is that the directory hierarchy of the files was lost. I deleted a large chunk of my music collection in this way a while ago but was pleased to find the files all jumbled together on a trash area when I finally got round to some storage spring cleaning.

The useful thing about music file formats such as FLAC and MP3 is that the files contain within them tags with such information as Artist and Album – it is these tags that you see when you play the music with player software.

I wrote a simple Perl script to re-create a nice orderly directory tree based on Genre/Artist/Album and by reading the tags. The Linux link command is used to then put the files back into a sensible place.

Continue reading “Have you ever deleted things by accident?”

A computer for OpenStack

Having used OpenStack professionally for a few months, and having attended Mirantis Openstack Training,  I thought it would be useful to be able to run my own system at home. So I could do my own experiments. Openstack’s promises of fault tolerance, with all data stored in more than one place mean that at the end of it I would also have a home environment with some quite useful features.

Practical “Real World” OpenStack deployments normally consist of one or more racks of noisy server class computers, intended to be operated 24/7 and consuming many kilowatts of power. Not really a suitable environment for home. I wanted something quiet, and as it was only going to be used occasionally, affordable! Computer hardware depreciates fiercely so having computers sitting doing nothing makes little sense. It is in fact this logic that has driven adoption of “The Cloud” where computing ability is rented by the hour rather than companies having to commission and maintain enough physical hardware to cope with the largest expected peak, even if that peak only comes once a year.

Continue reading “A computer for OpenStack”