I’d like to start on a set of rules for running and supporting onlines services in a way that takes advantage of lean production principles. This is carrying on the thoughts stirred up by my recent exposure to ITIL, which is pretty much the IT infrastructure equivalent of the waterfall development principle.
So the first pattern I came up with in that post was around keeping people involved throughout the process of planning, rolling out, and running a service, rather than having each of these things done by a different set of people, relying on the mythical “knowledge transfer” process. I’m sure there’s a lot more to say on that one, but for the moment I’d like to get another idea out.
This pattern can be called DeployOften. I’ve found that a good rule in most areas of life is: if you find something difficult to get right, do it more often. This is obvious in some contexts - it’s called practice - but in day to day business it goes against the grain to seek out painful tasks and repeat them more than is necessary to get through the job. The benefit is that the more you do something, the better you get at it, and it becomes less painful.
The principle of doing difficult things more often is found throughout agile development methodologies, an obvious example being test-driven development. Where testing is usually a painful and dull process, typically skipped or skimped, agile makes it a central behaviour, in fact the very first thing you should do when coding is to write up automated tests.
A difficult and painful part of service management is deploying new or updated software, part of what ITIL calls the transition phase. One organisation I know of tries to release updates to their software every 3 months, and each time it’s a trial. Deploying the software to the server inevitably turns up surprises, and acceptance testing by users drags on with multiple rounds as updated releases are built to fix problems discovered.
This is in spite of automated nightly and iteration builds, which somehow never bring out the same issues that come out even on staging servers, which use snapshots of current live data sets, and more rigidly mimic the live deployment environment.
The DeployOften pattern suggests that the operations team should deploy each iteration build onto a production-like environment rather than waiting for the nearly-complete release. This will raise cries from the ops people, who don’t exactly have light workloads already. But by deploying every two weeks they will get it down to a very quick process, and also turn up deployment problems much sooner in the development cycle, which should raise the developers’ awareness of the kinds of things they need to keep in mind to make deployment easier.