kief.com

Sporadically delivered thoughts on Continuous Delivery

Infrastructure as Code - Automation Is Not Enough

| Comments

Infrastructure automation has become a mainstream theme in our industry, but automation without Infrastructure as Code practices only spurs the growth of chaotic IT sprawl. Organisations that depend on IT are still held back by their inability to quickly and reliably adapt to business challenges and opportunities. IT Ops people continue to be bogged down in firefighting, with barely enough time to keep systems running, leaving little time for fundamental improvements.

Back in the the Iron Age …

Virtualization and cloud (IaaS, Infrastructure as a Service, in particular) have forced the need for automation of some kind. In the old days, the “Iron Age” of IT, infrastructure growth was limited by the hardware purchasing cycle. Since it would take weeks for a new server to arrive, there was little pressure to rapidly install and configure an operating system on it. We would slot in a CD and follow our checklist, and a few days later it would be ready.

But the ability to spin up new virtual machines in minutes required us to get a lot better at automating this process. Server image templates and cloning helped get us over the hump. But now we had a new problem. Because we could, assuming enough overall capacity, spin up new VMs at the drop of a hat, we found ourselves with an ever-growing portfolio of servers. The need to keep a constantly growing and changing number of servers up to date and avoiding Configuration Drift spawned new tools.

Infrastructure as Code is born

CFengine, Puppet, and Chef established a new category of infrastructure automation tool quickly adopted by the early adopters, those nimble organisations who were taking full advantage of IaaS cloud as it emerged. These organisations, whose IT was typically built around Agile and Lean mindsets, evolved “Infrastructure as Code” practices to managing their automated infrastructure.

The essence of Infrastructure as Code is to treat the configuration of systems the same way that software source code is treated. Source code management systems, Test Driven Development (TDD), Continuous Integration (CI), refactoring, and other XP practices are especially useful for making sure that changes to infrastructure are thoroughly tested, repeatable, and transparent.

Enter the enterprise vendors - who get it get it wrong (of course)

As more traditional organisations have adopted virtualization - generally on in-house infrastructure rather than public clouds - they’ve felt the same need for automation to manage their systems. But although some have explored the toolsets used by the early adopters, many turn to traditional vendors of so-called enterprise management toolsets, who have moved to adapt and rebrand their software to catch the latest waves in the industry (“Now with DevOps!”)

The problem is that few of these toolsets are designed to support Infrastructure as Code. Yes, they do automate things. Once you point and click your way through their GUI to create a server template, you can create identical instances to your heart’s content. But when you go back and make tweaks to your template, you don’t have a traceable, easily understood record of the change. You can’t automatically trigger automatic testing of each change, using validation tools from multiple vendors, open source projects, and in-house groups.

In short, rather than using intensive, automatically enforced extreme change management you’re stuck with old-school, manual, “we’d do it more thoroughly if we had time” change management.

The difference is:

Infrastructure automation makes it possible to carry out actions repeatedly, across a large number of nodes. Infrastructure as code uses techniques, practices, and tools from software development to ensure those actions are thoroughly tested before being applied to business critical systems.

What to demand from your tools

Here are some guidelines for choosing configuration management tools that support Infrastructure as Code:

  • The definitions used to create and update system configurations should be externalizable in a format that can be stored in off the shelf version control systems such as Git, Subversion, or Perforce. This enables the adoption of a wide variety of tools for managing, validating, and testing software source code, rather than locking you into a single vendor’s toolset. It also gives you a history of every change, along with who made it and (hopefully) why, and the ability to roll back.
  • It should be possible to validate definitions at various levels of granularity, so you can apply a variation of the test pyramid. Quick syntax and code style validations, followed by execution of individual units of configuration, followed by instantiating of VMs that can be validated, etc. This offers the benefits of fast feedback and correction of changes, and is the foundation for Continuous Integration and a building a Continuous Delivery pipeline.

Without the ability to ensure that every change is quickly and easily tested as a matter of course, we’re forced to rely on people to take the time to manually set up and run tests, even when they’re under pressure. Without visibility and openness of configuration changes, we end up locked into the limited toolset of a single vendor, and deprive ourselves of a huge ecosystem of tools for managing software changes.

Bottom line:

The defining characteristic of our move beyond the “Iron Age” into the “Cloud Era” is that infrastructure can now be treated like software. Ensuring we’re able to bring the most effective software development practices to bear is the key to getting the most value out of this shift.

Six Weeks to Provision a VM?!? #ChangeManagementFail

| Comments

A while back when I was working in a global financial institution we requested two virtual machines for development and testing. Our requirements were for a standard operating system and application server from the IT organisation’s service catalog, nothing special.

The VMs were delivered six weeks later.

They were configured differently from one another.

One of them was broken - the application server failed to start due to permission errors.

Sad fact: this is the norm in our industry

Why did it take so long to deliver virtual machines, and why could they not deliver consistent, working machines? The IT department had the most expensive virtualisation platform money could buy, and used two different big-brand, top-dollar enterprise automation products for provisioning and configuring systems and software.

The organisation also had the most comprehensive, SOX certified, ITIL compliant processes you could have for change management and IT service delivery. That’s why it took six weeks; there were at least five different departments involved in creating the VM and configuring the operating system, user accounts, application server, and networking. The process involves CAB meetings, detailed service request documents, handover documents and security audits.

This is not an unusual story. I’ve worked in and spoken with dozens of enterprise IT organisations over the past few years, and this kind of thing is painfully common. In fact, it’s the norm. People in large organisations take this for granted. “Of course things take a while, and there are communication issues. We’re big and complex!”

When I suggest things don’t have to be this way, that there are some (admittedly a minority) large organisations which handle this stuff effectively by taking a different approach, they recoil. They say:

“That kind of stuff might work for websites and startups, but we’re too big and complex.”

This reaction is puzzling at first glance. Do people really think that more rigorous change management practices are not relevant to larger, older organisations?

I suspect the real root of the rejection of agile practices in large organisations is a belief that traditional change management practices work. Or at least, that they would work if properly applied. It certainly sounds like it should work. Spend more time planning and checking changes, get more people to reviewing and approve them, document everything more thoroughly, and evaluate each deviation from plan even more thoroughly, then, surely, the outcome will be of higher quality.

Right?

But in practice, things almost never work out this way. Each handover to another person or group is an opportunity to introduce errors and deviation. More misunderstanding, more loss of context, and less ownership for the end result. Very few organisations using these practices actually achieve high levels of quality (Hint: Your company is not NASA), instead things get done through heroics and workarounds to the process, with plenty of mistakes and disasters. (No, that’s not the organisation from my story.)

A better way

An effective IT organisation should be able to deliver a virtual machine in a matter of hours or minutes, rather than weeks. And it should be able to deliver this more reliably and consistently than a traditional process does, ensuring that the VM is fully compliant with corporate standards for security, quality, and conformity.

How could the organisation with the six week delivery time for VM’s achieve this?

  1. Create standard VM templates with the commonly required packages baked into them. When someone requests a VM, spin up an instance from the relevant template, rather than having a series of steps to configure different aspects of the system.
  2. Automate the process for updating these templates. Each time a change is needed or a new version of packages are released, apply these to the template, conduct quality checks (with as much automation as possible), and make the template available.
  3. Involve the experts from the various functional specialties (OS, etc.) in creating the automation for updating templates, and for implementing the automated validation, to ensure any change complies with corporate standards.
  4. Create a self-service application, so that groups requiring VM’s can use a dashboard to spin them up, entering cost centre and budget approval information as necessary.

There is more to it than this, of course, including how to ensure that changes to the standard template are applied to existing VMs created from earlier templates. But this is a start.

Extreme Change Management

| Comments

We need to change the way we talk about change management. New technologies, practices, and commercial pressures have made traditional change management approaches difficult to apply effectively. Traditionalists view these new ways of working as irresponsible, inapplicable in an enterprise environment. Others have decided that change management is obsolete in a world where organizations need to be highly responsive to commercial realities.

Both of these are wrong.

There is no tradeoff between rapid delivery and reliable operations. New technologies such as cloud and infrastructure automation, plus agile approaches like DevOps and Continuous Delivery, allow changes to be made even more reliably, with far more rigorous control, than traditional approaches. There’s a useful parallel to Extreme Programming (XP), which takes quality assurance for software development to “the extreme”, with automated tests written and run in conjunction with the code they test, and run repeatedly and continuously throughout the development process.

The same is true with modern IT infrastrucure approaches. Automation and intensive collaboration support the business to make small changes safely, thorougly validate the safety of each change before rolling it out, immediately evaluate its impact once live, and rapidly decide and implement new changes in response to this information. The goal is to maximize both business value and operational quality.

The key is very tight loops involving very small changesets. When making a very small change, it’s easy to understand how to measure its impact on operational stability. The team can add monitoring metrics and alerts as appropriate for the change, deploy it to accurate testing environments for automated regression testing, and carry out whatever human testing and auditing can’t be automated. When this is done frequently, the work for each change is small, and the team becomes very efficient at carrying out the change process.

It’s good to validate changes before and after applying them to ensure they won’t cause operational problems. So, it must be even better to do this validation continuously, as each change is being worked out, rather than periodically (monthly or weekly).

It’s good to test that disaster recovery works correctly. So it must be even better to use disaster recovery techniques routinely as a part of normal processes for deploying changes, using Phoenix Servers or even Immutable Servers.

If it’s good to have a documented process that people should follow for making changes to systems, it must be even better to have the change process captured in a script. Unlike documentation, scripting will not fall out of date to actual procedure, it won’t skip steps, mis-type, or leave out key steps that certain people “just know”.

If it’s good to be able to audit changes that are made to a system, it must be even better to know that each change is automatically logged and traceable.

If it’s useful to have handovers so that the people responsible for operations and support can review changes and make sure they understand them, it must be even better to have continuous collaboration. This ensures those people not only fully understand the changes, but have shaped the changes to meet their requirements.

Immutable Server Blikis

| Comments

Martin Fowler has published a couple of bliki entries I wrote. The main piece is a definition of Immutable Servers, which is a term that our colleague Ben Butler-Cole coined to describe the practice of not making configuration changes to servers once they’ve been provisioned. In contrast to the better known Configuration Synchronization approach, where automated configuration updates are continuously applied using a tool like Chef or Puppet, a team that uses immutable servers makes configuration changes to the base images, and frequently destroys and rebuilds servers to keep them up to date.

The main advantage of this approach is that, by avoiding changes to a running system’s configuration, you reduce the risks that changes bring. You make changes to a base image, and can then run it through a battery of tests to make sure it’s OK before using it to create server images. This applies the principles behind Deployment Pipelines to infrastructure.

Ben and Peter Gillard-Moss have been evangelizing this approach within ThoughtWorks with their use of it on the Mingle SaaS project. Netflix are arguably the pioneers of this approach, and have released some open source tools to help manage AMI images on AWS for this purpose.

I’m running into increasing numbers of folks in the DevOps community who see infrastructures managed through heavily automated, continuous synchronization as too complicated and fragile.

If the chef-server, puppet-master approach to configuration management is Cloud Computing 2.0, immutable servers are the next thing. Interestingly, at least one commentator has confused this next generation of infrastructure management with pre-cloud practices. My (now-former - sniff) colleague Nic Ferrier responded to this based on a conversation with (still!) colleague Jim Gumbley.

These are truly interesting times in the world of IT infrastructure. The way we do things now is quite different from the way we did them ten years ago (albeit probably not for the majority - as with much technology, the future is not evenly distributed), and certainly different from the way we’ll do things in ten more years. It’s a blast to be involved in the shift!

CD Pipeline Implementation: Tracer Bullet (Trail Marker)

| Comments

On my current project we’re developing an essentially green field application, albeit one that integrates a fair bit of data managed in existing systems, in conjunction with the implementation of a new hosting infrastructure which will be used for other applications once it is established. We want to have a solid Continuous Delivery Pipeline to support the team developing the application, as well as to support the development and maintenance of the infrastructure platform.

In order to get the team moving quickly, we’ve kicked this all off using what we’ve called a “tracer bullet” (or “trail marker”, for a less violent image). The idea is to get the simplest implementation of a pipeline in place, priortizing a fully working skeleton that stretches across the full path to production over a fully featured, final-design functionality for each stage of the pipeline.

Trail marker

Our goal is to get a “Hello World” application using our initial technology stack into a source code repository, and be able to push changes to it through the core stages of a pipeline into a placeholder production environment. This sets the stage for the design and implementation of the pipeline, infrastructure, and application itself to evolve in conjunction.

Use cases

This tracer bullet approach is clearly useful in our situation, where the application and infrastructure are both new. But it’s also very useful when starting a new application with an existing IT organization and infrastructure, since it forces everyone to come together at the start of the project to work out the process and tooling for the path to production, rather than leaving it until the end.

The tracer bullet is more difficult when creating a pipeline from scratch for an existing application and infrastructure. In these situations, both application and infrastructure may need considerable work in order to automate deployment, configuration, and testing. Even here, though, it’s probably best to take each change made and apply it to the full length of the path to production, rather than wait until the end-all be-all system has been completely implemented.

Goals

When planning and implementing the tracer bullet, we tried to keep three goals in mind as the priority for the exercise.

  1. Get the team productive. We want the team to be routinely getting properly tested functionality into the application and in front of stakeholders for review as quickly as possible.
  2. Prove the path to production. We want to understand the requirements, constraints, and challenges for getting our application live as early as possible. This means getting everyone involved in going live involved, and, using the same infrastructure, processes, and people that will be used for going live, so that issues are surfaced and addressed.
  3. Put the skeleton in place. We want to have the bare bones of the application, infrastructure, and the delivery pipeline in place, so that we can evolve their design and implementation based on what we learn in actually using them.

Things can and should be made simple to start out with. Throughout the software development project changes are continuously pushed into production, multiple times every week, proving the process and identifying what needs to be added and improved. By the time the software is feature complete, there is little or no work needed to go live, other than DNS changes and publicizing the new software.

“Do’s” and “Do Not Do’s”

Do start as simply as you can

Don’t implement things that aren’t needed to get the simple, end to end pipeline in place. If you find yourself bogged down implementing some part of the tracer bullet pipeline, stop and ask yourself whether there’s something simpler you can do, coming back to that harder part once things are running. On my current project we may need a clever unattended provisioning system to frequently rebuild environments according to the PhoenixServer pattern. However, there are a number of issues around managing private keys, IP addresses, and DNS entries which make this a potential yak shave, so for our tracer bullet we’re just using the Chef knife-rackspace plugin.

Don’t take expensive shorcuts

The flip side of starting simply is not to take shortcuts which will cost you later. Each time you make a tradeoff in order to get the tracer bullet pipeline in place quickly, make sure it’s a positive tradeoff. Keep track of those tasks you’re leaving for later.

Examples of false tradeoffs are leaving out testing, basic security (e.g. leaving default vendor passwords in place), and repeatability of configuration and deployment. Often times these are things which actually make your work quicker and more assured - without automated testing, every change you make may introduce problems that will cost you days to track down later on.

It’s also often the case that things which feel like they may be a lot of work are actually quite simple for a new project. For my current project, we could have manually created our pipeline environments, but decided to make sure every server can be torn down and rebuilt from scratch using Chef cookbooks. Since our environments are very simple - stock Ubuntu and a JDK install and we’re good to go - this was actually more trivial than it would have been later on once we’ve got a more complicated platform in place.

Don’t worry too much about tool selection

Many organizations are in the habit of turning the selection of tools and technologies into complicated projects in their own right. This comes from a belief that once a tool is chosen, switching to something else will be very expensive. This is pretty clearly a self-fulfilling prophecy. Choose a reasonable set of tools to start with, ones that don’t create major barriers to getting the pipeline in place, and be ready to switch them out as you learn about how they work in the context of your project.

Do expect your design to change

Put your tracer bullet in place fully expecting that the choices you make for its architecture, technology, design, and workflow will all change. This doesn’t just apply to the pipeline, but to the infrastructure and application as well. Whatever decisions you make up front will need to be evaluated once you’ve got working software that you can test and use. Taking the attitude that these early choices will change later lowers the stakes of making those decisions, which in turn makes changing them less fraught. It’s a virtuous circle that encourges learning and adaptation.

Don’t relax the go-live constraints

It’s tempting to make it easy to get pre-live releases into the production environment, waiting until launch is close to impose the tighter restictions required for “real” use. This is a bad idea. The sooner the real-world constraints are in place, the quicker the issues those constraints cause will become visible. Once these issues are visible, you can implement the systems, processes, and tooling to deal with those issues, ensuring that you can routinely and easily release software that is secure, compliant, and stable.

Do involve everyone from the start

Another thing often left until the end is bringing in the people who will be involved in releasing and supporting the software. This is a mistake. In siloed organizations where software design and development is done by separate groups, the support people have deep insight into the requirements for making the operation and use of the software reliable and cost effective.

Involving them from the start and throughout the development process is the most effective way to build supportability into the software. When release time comes, handover becomes trivial because the support team have been supporting the application through its development.

Bringing release and support teams in just before release means their requirements are introduced when the project is nearly finished, which forces a choice between delaying the release in order to fix the issues, or else releasing software which is difficult and/or expensive to support.

Doing what’s right for the project and team

The question of what to include in the tracer bullet and what to build in once the project is up and running depends on the needs of the project and the knowledge of the team. On my current project, we found it easy to get a repeatable server build in place with chef configuration. But we did this with a number of shorcuts.

  • We’re using the out of the box server templates from our cloud vendor (Rackspace), even though we’ll probably want to roll our own eventually.
  • We started out using chef-solo (with knife-solo), even though we planned to use chef-server. This is largely due to knowledge - I’ve done a few smaller projects with knife-solo, and have some scripts and things ready to use, but haven’t used chef-server. Now that we’re migrating to chef-server I’m thinking it would have been wiser to start with the Opscode hosted chef-server. Moving from the hosted server to our own would have been easier than moving from solo to server.

Starting out with a tracer bullet approach to our pipeline has paid off. A week after starting development we have been able to demonstrate working code to our stakeholders. This in turn has made it easier to consider user testing, and perhaps even a beta release, far sooner than had originally been considered feasible.

Quality Plus Simplicity - the Sweet Spot

| Comments

There is a common belief in the software development world that a tradeoff exists between speed of delivery and quality, an idea Martin Fowler calls the Tradable Quality Hypothesis. It’s the idea that, in a pinch, you can speed up software delivery by not worrying so much about quality.

As Martin points out, people have different understanding of what quality means, but the definition that counts from a delivery point of view is that it’s the attributes that make the software easier to maintain and extend. Developers can work more quickly on code that is easy to understand and free from bugs.

So in practice, teams that prioritize speed over quality tend to achieve neither, while teams that prioritize quality, in many cases, deliver code very quickly.

The complexity axis

However, this isn’t always the case. Some teams focus on quality, but end up taking forever to deliver simple things. What’s missing from the speed vs. quality tradeoff is a second axis, completeness versus simplicity.

quadrant showing quality vs. speed and completeness vs. simplicity

Another word for completeness on this chart would be complexity, but this quadrant represents the aspirations of a team - what the team is trying to achieve - and no team aspires to complexity. Instead, teams try to design and implement software and systems which are complete.

A team that prioritizes completeness wants a system that can cope with anything. It can meet completely new requirements through configuration rather than code, easily scale to handle any load, and tolerate any conceivable or inconceivable failure.

The problem with this is partly described by the YAGNI principle. Most of what the team built isn’t actually going to be needed. A large proportion of the stuff that will be needed in the future is stuff that the team didn’t anticipate. But the real killer is that adding all this stuff adds more moving parts. It’s more stuff to implement, more stuff to go break, and then it’s more stuff to wade through when working on the codebase.

the original quadrant with an arrow showing the slide from quality + completeness to speed-focused

So the team sets out to build the perfect, well-engineered system, but over time the schedule comes under pressure, and the team realizes it needs to step up the pace. Elements of the design are dropped, leaving parts of the system that were already implemented unused, but still taking up space (and adding complexity) in the codebase.

There is a nearly inevitable slide into cutting corners in order to get things done, and before you know it, you’re trading off quality (“we’ll go back and clean it up later”) for speed. As we’ve seen, this leads to a quagmire of poor code quality which slows work down, made even worse because of an overcomplicated design and large amounts of unnecessary code.

High performing teams hit the sweet spot

the original quadrant with an arrow showing the slide from quality + completeness to speed-focused

What seems to unite high performing development teams is an obsessive focus on both quality and simplicity. Implement the simplest thing that will satisfy the actual, proven need, and implement it well. Make sure it’s clean, easy to understand, and correct. If something is wrong with it, fix it immediately.

There’s a line to tread here. I’ve seen some teams interpret this too strictly, and delivering software that works correctly and is simple, but is crappy in terms of user experience. The definition of quality software must include doing an excellent job of satisfying the user’s needs, while being ruthless about limiting the needs it tries to satisfy.

Teams that get this focus right are able to reliably deliver high quality software remarkably fast.

Pat Kua’s New Book on Agile Retrospectives

| Comments

ThoughtWorkers write loads of books, and I’m too lazy to make a habit out of reading, reviewing, and plugging them all. So given that I’ve gotten off my ass (erm, well, not literally of course) to tout Pat Kua’s new book, The Retrospective Handbook, you can be assured it’s not a rote act of loyalty to my colleagues.

As Pat says, if you were to pick only one agile practice to adopt, retrospectives are it. It’s the engine a team uses to identify and address ways to improve performance, so regular retrospectives become the forum to work out which other practices would be helpful, how to adjust they way they’re being used, and which ones are getting in the way or just unnecessary.

If you’ve tried retrospectives but not gotten as much out of them as the above bold claim suggests, Pat’s book could be for you. Everything in it is refreshingly practical and actionable for such a potentially hand-wavy, touchy-feely subject. It ranges from high level topics and techniques, through to dealing with common problems such as lack of action afterwards, to nuts and bolts details about the materials to use.

If you want a more detailed review of the book, check out our other colleague Mark Needham’s review. Then get the book itself!

And, yeah, check out the stuff our other colleagues have written as well. I may be too lazy to write them all up, but they’re quality stuff.

Organizing for Continuous Delivery - the Reading List

| Comments

I presented a webinar Organizing for Continuous Delivery earlier this week, which was a lot of fun. The recording of me droning over the slides is available at that link. I mentioned a number of books that influenced my thinking on the presentation, so I’d like to share the list here, with some additional ones that I’d recommend for people interested in this stuff. (Disclaimer, these are Amazon affiliate links.)

Beyond Performance: How Great Organizations Build Ultimate Competitive Advantage by Scott Keller and Colin Price
Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation by Jez Humble and Dave Farley
Freedom from Command and Control: Rethinking Management for Lean Service by John Seddon
Hard Facts, Dangerous Half-Truths And Total Nonsense: Profiting From Evidence-Based Management by Jeffrey Pfeffer and Robert I. Sutton
Implementing Lean Software Development: From Concept to Cash by Mary and Tom Poppendieck
The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Ries
The Modern Firm: Organizational Design for Performance and Growth by John Roberts

Presenting a Webinar About Organizational Structures and Continuous Delivery

| Comments

Today I’m presenting the 11th installment of ThoughtWorks’ Continuous Delivery webinar series. My talk is titled “Organizing for Continuous Delivery”, and it’s basically about the people and organizational aspects. In short, it’s intended to help think about how to answer the question, “How should we structure our people into teams to make Continuous Delivery work?”

You can either sign up (if it’s before the day), or view the recorded webinar (if you’re reading this from the future), on the ThoughtWorks website.

The Conflict Between Continuous Delivery and Traditional Agile

| Comments

In working with development teams at organizations which are adopting Continuous Delivery, I have found there can be friction over practices that many developers have come to consider as the right way for Agile teams to work. I believe the root of conflicts between what I’ve come to think of as traditional agile and CD is the approach to making software “ready for release”.

Evolution of software delivery

Waterfall A usefully simplistic view of the evolution of ideas about making software ready for release is this:

  • Waterfall believes a team should only start making its software ready for release when all of the functionality for the release has been developed (i.e. when it is “feature complete”).
  • Agile introduces the idea that the team should get their software ready for release throughout development. Many variations of agile (which I refer to as “traditional agile” in this post) believe this should be done at periodic intervals.
  • Continuous Delivery is another subset of agile which in which the team keeps its software ready for release at all times during development. It is different from “traditional” agile in that it does not involve stopping and making a special effort to create a releasable build.

Continuous Delivery is not about shorter cycles

Going from traditional Agile development to Continuous Delivery is not about adopting a shorter cycle for making the software ready for release. Making releasable builds every night is still not Continuous Delivery. CD is about moving away from making the software ready as a separate activity, and instead developing in a way that means the software is always ready for release.

Ready for release does not mean actually releasing

A common misunderstanding is that Continuous Delivery means releasing into production very frequently. This confusion is made worse by the use of organizations that release software multiple times every day as poster children for CD. Continuous Delivery doesn’t require frequent releases, it only requires ensuring software could be released with very little effort at any point during development. (See Jez Humble’s article on Continuous Delivery vs. Continuous Deployment.) Although developing this capability opens opportunities which may encourage the organization to release more often, many teams find more than enough benefit from CD practices to justify using it even when releases are fairly infrequent.

Friction points between Continuous Delivery and traditional Agile

As I mentioned, there are sometimes conflicts between Continuous Delivery and practices that development teams take for granted as being “proper” Agile.

Friction point: software with unfinished work can still be releasable

One of these points of friction is the requirement that the codebase not include incomplete stories or bugfixes at the end of the iteration. I explored this in my previous post on iterations. This requirement comes from the idea that the end of the iteration is the point where the team stops and does the extra work needed to prepare the software for release. But when a team adopts Continuous Delivery, there is no additional work needed to make the software releasable.

More to the point, the CD team ensures that their code could be released to production even when they have work in progress, using techniques such as feature toggles. This in turn means that the team can meet the requirement that they be ready for release at the end of the iteration even with unfinished stories.

This can be a bit difficult for people to swallow. The team can certainly still require all work to be complete at the iteration boundary, but this starts to feel like an arbitrary constraint that breaks the team’s flow. Continuous Delivery doesn’t require non-timeboxed iterations, but the two practices are complementary.

Friction point: snapshot/release builds

Many development teams divide software builds into two types, “snapshot” builds and “release” builds. This is not specific to Agile, but has become strongly embedded in the Java world due to the rise of Maven, which puts the snapshot/build concept at the core of its design. This approach divides the development cycle into two phases, with snapshots being used while software is in development, and a release build being created only when the software is deemed ready for release.

This division of the release cycle clearly conflicts with the Continuous Delivery philosophy that software should always be ready for release. The way CD is typically implemented involves only creating a build once, and then promoting it through multiple stages of a pipeline for testing and validation activities, which doesn’t work if software is built in two different ways as with Maven.

It’s entirely possible to use Maven with Continuous Delivery, for example by creating a release build for every build in the pipeline. However this leads to friction with Maven tools and infrastructure that assume release builds are infrequent and intended for production deployment. For example, artefact repositories such as Nexus and Artefactory have housekeeping features to delete old snapshot builds, but don’t allow release builds to be deleted. So an active CD team, which may produce dozens of builds a day, can easily chew through gigabytes and terabytes of disk space on the repository.

Friction point: heavier focus on testing deployability

Nobody likes cleaning up broken builds A standard practice with Continuous Delivery is automatically deploying every build that passes basic Continuous Integration to an environment that emulates production as closely as possible, using the same deployment process and tooling. This is essential to proving whether the code is ready for release on every commit, but this is more rigorous than many development teams are used to having in their CI.

For example, pre-CD Continuous Integration might run automated functional tests against the application by deploying it to an embedded application server using a build tool like Ant or Maven. This is easier for developers to use and maintain, but is probably not how the application will be deployed in production.

So a CD team will typically add an automated deployment to an environment will more fully replicates production, including separated web/app/data tiers, and deployment tooling that will be used in production. However this more production-like deployment stage is more likely to fail due to its added complexity, and may be may be more difficult for developers to maintain and fix since it uses tooling more familiar to system administrators than to developers.

This can be an opportunity to work more closely with the operations team to create a more reliable, easily supported deployment process. But it is likely to be a steep curve to implement and stabilize this process, which may impact development productivity.

Is CD worth it?

Given these friction points, what benefit is there to moving from traditional Agile to Continuous Delivery worthwhile, especially for a team that is unlikely to actually release into production more often than every iteration?

  • Decrease risk by uncovering deployment issues earlier,
  • increase flexibility by giving the organization the option to release at any point with minimal added cost or risk,
  • Involves everyone involved in production releases - such as QA, operations, etc. - in making the full process more efficient. The entire organization must identify difficult areas of the process and find ways to fix them, through automation, better collaboration, and improved working practices,
  • By continuously rehearsing the release process, the organization becomes more competent at doing it, so that releasing becomes autonomic, like breathing, rather than traumatic, like giving birth,
  • Improves the quality of the software, by forcing the team to fix problems as they are found rather than being able to leave things for later.

Dealing with the friction

The friction points I’ve described seem to come up fairly often when Continuous Delivery is being introduced. My hope is that understanding the source of this friction will be helpful in discussing it when it comes up, and working through the issues. If developers who are initially uncomfortable with breaking with the “proper” way of doing things, or find a CD pipeline overly complex or difficult understand the aims and value of these practices, hopefully they will be more open to giving them a chance. Once these practices become embedded and mature in an organization, team members often find it’s difficult to go back to the old ways of doing them.

Edit: I’ve rephrased the definition of the “traditional agile” approach to making software ready for release. This definition is not meant to apply to all agile practices, but rather applies to what seems to me to be a fairly mainstream belief that agile means stopping work to make the software releasable.