For the last couple of months, both at the Ubuntu Developer Summit in Mountain View and on the #upstart IRC channel, we’ve been discussing the changes we want to make to
upstart for the Feisty Fawn release of Ubuntu.
This will ship with a version of upstart based on the 0.3 series (it may end up getting called 0.5 before release); the primary goal for this are to have an init system that is suitable for general standalone list in any Linux distribution.
I’ll be giving a talk at
linux.conf.au 2007 in Sydney with that aim, I hope to persuade at least one other major Linux distribution that it’s the right solution.
A complete list of the specifications and bugs being targeted for the 0.3 release can be found
in Launchpad.
The rest of this post will introduce some of the shiniest new things.
Writing Jobs
Upstart takes care of starting, supervising and stopping daemons itself; unlike in the init script system where you have to write code to do that yourself, often using a helper like
start-stop-daemon
. All you need to is give the path to, and arguments for, the binary you wish to be started.
exec /usr/bin/dbus-daemon
Some jobs, especially quick tasks, will usually be written as shell scripts. To save having to write a separate file and invoke it, you can include shell script code directly in the job file instead of using the
exec
stanza.
script
echo /usr/share/apport/apport > /proc/sys/kernel/crashdump-helper
end script
Usually it’s not sufficient to just start a binary and wish it well; you frequently need something to be run before it is started to prepare the system, and sometimes something after it terminates to clean up again.
For these purposes, additional snippets of shell code can be given – to be run before the binary is started, and after it has finished. Unlike init scripts, these do not need to start or stop the daemon itself; that’s done automatically based on the
exec
stanza.
pre-start script
mkdir -p /var/run/dbus
chown messagebus:messagebus /var/run/dbus
end script
post-stop script
rm -f /var/run/dbus/pid
end script
For consistency, executables may be specified with
pre-start exec
and
post-start exec
instead of shell scripts as above.
It’s sometimes useful to be able to run something after the binary has been started; for example, you may wish to attempt to connect to the daemon to determine whether it is ready to serve requests.
post-start script
or
post-start exec
can be used to this.
post-start script
# wait for listen on port 80
while ! nc -q0 localhost 80 </dev/null >/dev/null 2>&1; do
sleep 1;
done
end script
It’s also useful to be able to notify a daemon that it may be about to be stopped, or delay it for a while.
pre-stop script
or
pre-stop
exec can be used for this.
pre-stop script
# disable the queue, wait for it to become empty
fooctl disable
while fooq >/dev/null; do
sleep 1
done
end script
Events
Events are now quite a bit more detailed than in previous versions; they’re still named with simple strings that are up to the system sending the event, but they can now include arguments and environment variables which are passed through to jobs being started or stopped as a result.
initctl emit network-interface-up eth0 -DIFADDR=00:11:D8:98:1B:37
This command will now output all of the effects of this event, and will not terminate until the event has been fully handled inside upstart.
Events such as the above can be used by jobs that examine the event arguments and environment within their script:
start on network-interface-up
script
[ $1 = lo ] && exit 0
grep -q $IFADDR /etc/network/blacklist && exit 0
# etc.
end script
or matched directly in the
start on
and
stop on
stanzas:
start on block-device-added sda*
The events generated by job state changes have also changed. Previously both jobs and events shared the same namespace, which not only caused confusion but actually caused some problems when one accidentally named a job after an event.
The two primary events generated are now simply called
started
and
stopped
; they inform you that a job is fully up and running, or fully shut down again. The name of the job is received as an argument to this event.
start on started dbus
The
started
event is not emitted until the
post-start
task (described above) has finished; so the
post-start
task can delay other jobs from starting because they can’t yet connect to the daemon.
Likewise the
stopped
event is not emitted until after the
post-stop
task has finished.
The other two events emitted by a job are special; they are the
starting
and
stopping
events. The reason they are special is that the job is not permitted to start or stop until the event has been handled.
This means that if you have a task to perform when your database server is stopped, but before it’s actually terminated, it’s as simple as:
start on stopping mysql
exec /usr/bin/backup-db.py
MySQL won’t be terminated until the backup has finished.
This is especially useful for daemons that depend on each other, for example HAL needs DBUS, it shouldn’t be started until DBUS is running and DBUS should not be stopped until HAL has been terminated. All the HAL job needs is:
start on started dbus
stop on stopping dbus
Likewise if tomcat is installed, Apache should not be started until tomcat is running; and tomcat should not be stopped until apache has been terminated. All the tomcat job needs is:
start on starting apache
stop on stopped apache
Failure
Nothing goes smoothly all of the time, sometimes tasks the job runs will fail, or the daemon itself will die. As well as providing the ability for a crashed daemon to be automatically restarted, upstart ensured that other jobs are notified with a special
failed
argument to the
stopping
and
stopped
events.
start on stopped typo failed
script
echo "typo failed again :-(" mail -s "typo failed" root
end script
And if any job started or stopped by an event fails, it’s possible to discover that the event itself failed.
start on network-interface-up/failed
States
While tasks such as configuring a network interface, or checking and mounting a block device are usually performed as a result of events; services are more complicated.
Services normally need to be running while the system is in a certain state, not just when a particular event occurs. Therefore upstart allows you to describe arbitrarily complex system states by referring to events that define their changes.
For example, many services should be running only while the filesystem is mounted, and at least one network device is up. We have events to indicate the changes into and out of these dates, we just need to combine them:
from fhs-filesystem-mounted until fhs-filesystem-unmounted
and from network-up until network-down
The
until
operator defines a period between two events, the
and
operator ensures we’re within both of these periods.
Perhaps we need to be running while any display manager is:
from started gdm until stopping gdm
or started kdm until stopping kdm
Or maybe we only want to be run if a network interface comes up before bind9 has been started:
on network-interface-up and from startup until started bind9
These “complex event configurations” can appear in any job file; and any job file itself can serve as a reference for other jobs. They will be started and stopped at the same time as the named job:
with apache
Omitting the
exec
or
script
stanza from a job file means that it simply defines a state that can serve as a reference for others. As such, the
multiuser
state is simply a job file that defines it.
As an added bonus, these states can still have
pre-start
,
post-stop
, etc. scripts.