CakePHP, DITA, and continuous integration

For my last two years at IBM, I led a team developing a continuous integration build system for DITA based builds.  We chose to base our application on the CakePHP rapid development framework with an IBM DB2 database to store the build definitions, build results, and other metadata. The build execution is handled by ANT and various internal tools.


The project began with the need to provide a consistent build infrastructure for the different teams within our organization. Prior to starting this project, teams were building their own ANT build scripts in dramatically different ways of varying complexity using what I liked to call our “not-so-common build utilities.”  We had custom targets to facilitate processing, but it was too open ended and every implementation became hard coded and customized to a level that only the creator could maintain. We needed simplification, consistency, and standardization.

Our organization had too many people supporting custom code for small groups for no other reason than it evolved that way. The primary build output at IBM is Eclipse plugins containing XHTML output for use within Eclipse help systems running in “information center” mode, which are primarily hosted on Our secondary output is PDF files. There are some additional outputs, but those two cover at least 95% of our builds.  Sounds relatively easy right? If you are familiar with the DITA Open Toolkit or IBM’s tooling, you will know that there are many variations that can go into building XHTML output from DITA. You can have different headers, footers/ XSL overrides, ditavals, file extensions, not to mention source file locations, naming conventions, or navigation architectures.

The goal of the project was to greatly simplify and speed up the processes associated with DITA builds by providing a web application to handle the entire build life cycle.

Design choices

We chose PHP as the language of choice for our application because of its lower barriers to learning. A good number of our writers have some level of experience with PHP which meant that when the time came for me to move on, that others within the organization could take over development. Our information development teams typically do not get  much in the way of programming support so it was important that the project could be maintained by our own personnel.

The CakePHP rapid development framework significantly reduced the amount of utility code that we needed to write, such as a logic for the database communication layer or email. The framework also helped reduce the amount of complex  SQL code that we needed to write.  The vibrant community of CakePHP developers meant that customized plugins and other CakePHP extensions were availble to help further speed up development.

We opted for jQuery and jQuery UI instead of CakePHP’s default Prototype library due to jQuery’s ease of use and also very active plugin community.  Replacing the Protoype-based Ajax helper with a jQuery-based helper was a piece of cake (pun intended).

The application data is stored within a DB2 version 9.7 database. The application uses a PHP port of the DelayedJob library with the CakeDjjob plugin to handle delayed and remote execution of jobs. Our build servers run as many workers as the server’s capacity allows or targets workers to certain tasks such as information center hosting tasks. The builds are executed as Apache ANT build jobs and use an internal library for working with IBM information deliverables.  The CakePHP application dynamically assembles the ANT scripts and places the build into the job queue for a build worker to execute.

The experience

We created a UI experience that puts as much control into our writer’s hands as possible and to customize the application to them individually. They have a homepage that lists all of their current build projects and a news feed that displays recent events or problems.  They can dive straight into troubleshooting build problems directly from their homepage or quickly scan the results of a project build. When a writer needs a build, they can simply launch an on-demand build, which typically finishes in one to two minutes depending on the size of their deliverable.

Current state of the project

When I left IBM at the end of March 2012, we had somewhere around 130 build projects on the system, 150 active users, and 80 test information centers running. The project has been a huge success and far surpasses our original goals.  Upon my departure the team was testing our next major release, which will feature the first delivery that takes full advantage of the build farm architecture.

This was a very ambitious project when we first launched it and was a testament to how much a small team can accomplish through persistance and cohesiveness. The project was challenging and enjoyable to work on. It made my decision to leave IBM for Google very difficult because I was enjoying my work so much, but couldn’t pass up the opportunity that Google presented.