CakePHP, DITA, and continuous integration

For my last two years at IBM, I led a team developing a continuous integration build system for DITA based builds.  We chose to base our application on the CakePHP rapid development framework with an IBM DB2 database to store the build definitions, build results, and other metadata. The build execution is handled by ANT and various internal tools.


The project began with the need to provide a consistent build infrastructure for the different teams within our organization. Prior to starting this project, teams were building their own ANT build scripts in dramatically different ways of varying complexity using what I liked to call our “not-so-common build utilities.”  We had custom targets to facilitate processing, but it was too open ended and every implementation became hard coded and customized to a level that only the creator could maintain. We needed simplification, consistency, and standardization.

Our organization had too many people supporting custom code for small groups for no other reason than it evolved that way. The primary build output at IBM is Eclipse plugins containing XHTML output for use within Eclipse help systems running in “information center” mode, which are primarily hosted on Our secondary output is PDF files. There are some additional outputs, but those two cover at least 95% of our builds.  Sounds relatively easy right? If you are familiar with the DITA Open Toolkit or IBM’s tooling, you will know that there are many variations that can go into building XHTML output from DITA. You can have different headers, footers/ XSL overrides, ditavals, file extensions, not to mention source file locations, naming conventions, or navigation architectures.

The goal of the project was to greatly simplify and speed up the processes associated with DITA builds by providing a web application to handle the entire build life cycle.

Design choices

We chose PHP as the language of choice for our application because of its lower barriers to learning. A good number of our writers have some level of experience with PHP which meant that when the time came for me to move on, that others within the organization could take over development. Our information development teams typically do not get  much in the way of programming support so it was important that the project could be maintained by our own personnel.

The CakePHP rapid development framework significantly reduced the amount of utility code that we needed to write, such as a logic for the database communication layer or email. The framework also helped reduce the amount of complex  SQL code that we needed to write.  The vibrant community of CakePHP developers meant that customized plugins and other CakePHP extensions were availble to help further speed up development.

We opted for jQuery and jQuery UI instead of CakePHP’s default Prototype library due to jQuery’s ease of use and also very active plugin community.  Replacing the Protoype-based Ajax helper with a jQuery-based helper was a piece of cake (pun intended).

The application data is stored within a DB2 version 9.7 database. The application uses a PHP port of the DelayedJob library with the CakeDjjob plugin to handle delayed and remote execution of jobs. Our build servers run as many workers as the server’s capacity allows or targets workers to certain tasks such as information center hosting tasks. The builds are executed as Apache ANT build jobs and use an internal library for working with IBM information deliverables.  The CakePHP application dynamically assembles the ANT scripts and places the build into the job queue for a build worker to execute.

The experience

We created a UI experience that puts as much control into our writer’s hands as possible and to customize the application to them individually. They have a homepage that lists all of their current build projects and a news feed that displays recent events or problems.  They can dive straight into troubleshooting build problems directly from their homepage or quickly scan the results of a project build. When a writer needs a build, they can simply launch an on-demand build, which typically finishes in one to two minutes depending on the size of their deliverable.

Current state of the project

When I left IBM at the end of March 2012, we had somewhere around 130 build projects on the system, 150 active users, and 80 test information centers running. The project has been a huge success and far surpasses our original goals.  Upon my departure the team was testing our next major release, which will feature the first delivery that takes full advantage of the build farm architecture.

This was a very ambitious project when we first launched it and was a testament to how much a small team can accomplish through persistance and cohesiveness. The project was challenging and enjoyable to work on. It made my decision to leave IBM for Google very difficult because I was enjoying my work so much, but couldn’t pass up the opportunity that Google presented.

Dita4Hudson project on SourceForge

A recent comment by Yucheng on my preview of Dita4Hudson post finally motivated me to upload my starter code to SourceForge so that others can contribute and make use of it.

The plugin right now is mostly front-end code and does not yet run the DITA Open Toolkit transforms.  The intent of the plugin is to very simply provide text fields that a user can enter any and all applicable transform parameters for a given output type and then in the background those would be passed to the toolkit in the appropriate form and basically be a command line call.

I would love to see others pick up where I left off. I hope to be able to continue working on it again in the future, but for now I have too much else going on. I would be happy to help someone out with getting going with it. Please comment or use the contact form to get in touch.

Preview of Dita4Hudson plugin for the Hudson continuous integration server

I am progressing a bit further on my plugin that will allow for easier DITA builds within the Hudson continuous integration server.  Hudson is a quite handy build management system that offers you a lot of additional value on top of your existing builds and processes.

You can use your existing DITA-OT Ant scripts to build your output within Hudson. Many Dita users are not Ant experts and often Ant is one of the limiting factors in Dita adoption by smaller teams. I hope this plugin will help further promote DITA and provide another simple tool for Dita builds.

All feedback and suggestions are welcome. Continue reading

DITA builds with WinAnt Echidna

I recently stumbled across the WinAnt Echidna project on SourceForge while browsing the DITA Open Toolkit project forums.  While DITA as a markup language is not difficult to learn and begin to start creating content, DITA can be difficult to see the fruits of those labors if you are just getting starting with the DITA-OT. Most information developers are not experts in XSL or ANT.  The WinAnt tool can help them get started with producing DITA sourced content very quickly.

WinAnt is generally going to appeal to writers and teams on smaller projects. Larger and more complex projects and deliverables likely have requirements that go beyond what WinAnt provides. Continue reading

DITA as a wiki format?

Wikis for documentation make sense for many reasons, including  low cost of implementation, ease of publishing, and collaboration possibilities.  DITA has become a popular XML format for semantic markup of information and is generally used for documentation.  Wiki content is generally authored or stored as wikitext, which is an non-standardized markup format.  Should DITA be used as a markup format for wikis instead of wikitext or HTML? I believe the answer is that DITA in the authoring environment of wikis is impractical and does not work for general audiences. Continue reading