CakePHP, DITA, and continuous integration

For my last two years at IBM, I led a team developing a continuous integration build system for DITA based builds.  We chose to base our application on the CakePHP rapid development framework with an IBM DB2 database to store the build definitions, build results, and other metadata. The build execution is handled by ANT and various internal tools.


The project began with the need to provide a consistent build infrastructure for the different teams within our organization. Prior to starting this project, teams were building their own ANT build scripts in dramatically different ways of varying complexity using what I liked to call our “not-so-common build utilities.”  We had custom targets to facilitate processing, but it was too open ended and every implementation became hard coded and customized to a level that only the creator could maintain. We needed simplification, consistency, and standardization.

Our organization had too many people supporting custom code for small groups for no other reason than it evolved that way. The primary build output at IBM is Eclipse plugins containing XHTML output for use within Eclipse help systems running in “information center” mode, which are primarily hosted on Our secondary output is PDF files. There are some additional outputs, but those two cover at least 95% of our builds.  Sounds relatively easy right? If you are familiar with the DITA Open Toolkit or IBM’s tooling, you will know that there are many variations that can go into building XHTML output from DITA. You can have different headers, footers/ XSL overrides, ditavals, file extensions, not to mention source file locations, naming conventions, or navigation architectures.

The goal of the project was to greatly simplify and speed up the processes associated with DITA builds by providing a web application to handle the entire build life cycle.

Design choices

We chose PHP as the language of choice for our application because of its lower barriers to learning. A good number of our writers have some level of experience with PHP which meant that when the time came for me to move on, that others within the organization could take over development. Our information development teams typically do not get  much in the way of programming support so it was important that the project could be maintained by our own personnel.

The CakePHP rapid development framework significantly reduced the amount of utility code that we needed to write, such as a logic for the database communication layer or email. The framework also helped reduce the amount of complex  SQL code that we needed to write.  The vibrant community of CakePHP developers meant that customized plugins and other CakePHP extensions were availble to help further speed up development.

We opted for jQuery and jQuery UI instead of CakePHP’s default Prototype library due to jQuery’s ease of use and also very active plugin community.  Replacing the Protoype-based Ajax helper with a jQuery-based helper was a piece of cake (pun intended).

The application data is stored within a DB2 version 9.7 database. The application uses a PHP port of the DelayedJob library with the CakeDjjob plugin to handle delayed and remote execution of jobs. Our build servers run as many workers as the server’s capacity allows or targets workers to certain tasks such as information center hosting tasks. The builds are executed as Apache ANT build jobs and use an internal library for working with IBM information deliverables.  The CakePHP application dynamically assembles the ANT scripts and places the build into the job queue for a build worker to execute.

The experience

We created a UI experience that puts as much control into our writer’s hands as possible and to customize the application to them individually. They have a homepage that lists all of their current build projects and a news feed that displays recent events or problems.  They can dive straight into troubleshooting build problems directly from their homepage or quickly scan the results of a project build. When a writer needs a build, they can simply launch an on-demand build, which typically finishes in one to two minutes depending on the size of their deliverable.

Current state of the project

When I left IBM at the end of March 2012, we had somewhere around 130 build projects on the system, 150 active users, and 80 test information centers running. The project has been a huge success and far surpasses our original goals.  Upon my departure the team was testing our next major release, which will feature the first delivery that takes full advantage of the build farm architecture.

This was a very ambitious project when we first launched it and was a testament to how much a small team can accomplish through persistance and cohesiveness. The project was challenging and enjoyable to work on. It made my decision to leave IBM for Google very difficult because I was enjoying my work so much, but couldn’t pass up the opportunity that Google presented.

Dita4Hudson project on SourceForge

A recent comment by Yucheng on my preview of Dita4Hudson post finally motivated me to upload my starter code to SourceForge so that others can contribute and make use of it.

The plugin right now is mostly front-end code and does not yet run the DITA Open Toolkit transforms.  The intent of the plugin is to very simply provide text fields that a user can enter any and all applicable transform parameters for a given output type and then in the background those would be passed to the toolkit in the appropriate form and basically be a command line call.

I would love to see others pick up where I left off. I hope to be able to continue working on it again in the future, but for now I have too much else going on. I would be happy to help someone out with getting going with it. Please comment or use the contact form to get in touch.

Preview of Dita4Hudson plugin for the Hudson continuous integration server

I am progressing a bit further on my plugin that will allow for easier DITA builds within the Hudson continuous integration server.  Hudson is a quite handy build management system that offers you a lot of additional value on top of your existing builds and processes.

You can use your existing DITA-OT Ant scripts to build your output within Hudson. Many Dita users are not Ant experts and often Ant is one of the limiting factors in Dita adoption by smaller teams. I hope this plugin will help further promote DITA and provide another simple tool for Dita builds.

All feedback and suggestions are welcome. Continue reading

Patent #7,627,854 – Graphical Aid for Generating Object Setup Scripts

I discovered today that nearly four years after filing our initial patent paperwork that our US patent was finally issued. A few months back, we received a Chinese patent for the same submission (#ZL200710001513.4).

You can read more about the work that went into this submission and also read the patent itself.

Recently, the tool that inspired this patent was bundled as embedded feature of the IBM InfoSphere Replication Server Version 9.7 that shipped to customers this last June.

Photo of the actual Chinese patent:

Creating dynamic documentation plugins for the Eclipse Help System using Java

Eclipse documentation plug-ins can contain Java for creating dynamic content. You can extend the Eclipse help system by taking advantage of the extension point. The examples in this article demonstrate how to create a simple template that is parsed at run time to display dynamic text.

Continue reading

Link checking your topics in an Eclipse Help System/Infocenter

Links can be painful to deal with in help systems especially in multi-author environments or with larger numbers of topics.  Maintaining links over time can also be difficult. If you are authoring your help in DITA and using relationship tables, you might find they aren’t as easy to maintain as many claim. The Eclipse help system can make link validation difficult due to its frameset and difficulty for most link checkers to find all of your pages. Continue reading

DITA builds with WinAnt Echidna

I recently stumbled across the WinAnt Echidna project on SourceForge while browsing the DITA Open Toolkit project forums.  While DITA as a markup language is not difficult to learn and begin to start creating content, DITA can be difficult to see the fruits of those labors if you are just getting starting with the DITA-OT. Most information developers are not experts in XSL or ANT.  The WinAnt tool can help them get started with producing DITA sourced content very quickly.

WinAnt is generally going to appeal to writers and teams on smaller projects. Larger and more complex projects and deliverables likely have requirements that go beyond what WinAnt provides. Continue reading

DITA as a wiki format?

Wikis for documentation make sense for many reasons, including  low cost of implementation, ease of publishing, and collaboration possibilities.  DITA has become a popular XML format for semantic markup of information and is generally used for documentation.  Wiki content is generally authored or stored as wikitext, which is an non-standardized markup format.  Should DITA be used as a markup format for wikis instead of wikitext or HTML? I believe the answer is that DITA in the authoring environment of wikis is impractical and does not work for general audiences. Continue reading

New technologies must always be applied to recipes first

It always baffles me why the first place that people often try to apply new technologies is to the Kitchen, specifically recipes.  This practice reminds of starting with Hello World for programming and Lorem Ipsum Dolar for design.  Basically that says “we have some cool tools but we really don’t yet know what to do with them.” The next logical step is somehow making this tool apply to recipes right? Then you have a tool and a purpose. Continue reading