I'd Rather Be Writing Podcast
The complexities of translation and the need for dynamic variables in the build process
Listen to this post: You can download the MP3 file, subscribe in iTunes, or listen with Stitcher. I mentioned in previous posts that I was tackling translation with static site generators, and that I would circle back around on this topic to provide more detail (see Will the docs-as-code approach scale?). Translation is a complex undertaking. In Andrew Etter’s Modern Technical Writing, he says translation projects are time-consuming and costly. To quote: Internationalization, the process of translating documentation to other languages, is a nightmare. If you ever think you need to do it, interface with management and perform a careful cost-benefit analysis, because the process is expensive, time-consuming, error-prone, and tedious. Once you’ve arrive at what you believe is an accurate estimate for company costs, triple it. Now you have a realistic estimate. Etter briefly describes his translation workflow using a static site generator, Sphinx. The worklow involves using gettext scripts to convert the content into POT (Portable Object Template) files, which he sends to a translation company. The translation company converts them to PO (Portable Object) files (these file formats basically facilitate separating the text into strings that translators can manage) and after finishing the translation, sends the files back. He commits them to a repo, converts the PO files to MO (Machine Object) files, and builds them again in his Sphinx project. There are quite a few different tools, formats, workflows, and approaches for doing translation. For example, here’s how one group handles translation with Middleman, another static site generator. Their process is quite different. They set environment variables in their configuration files that the provide information to the server about which language to build. Their process involves Codeship, Heroku, submodules in Git repositories, webhooks, custom Rack apps, and more. My scenario is a lot simpler. For some projects, we send out files to translation agencies. One translation agency requires the content to either be HTML or Microsoft Word, while another translation agency accepts Markdown, HTML, or Word. I’m not sure what the agency does with the files (I assume they convert them to another format to ingest them in their computer-assisted translation systems), but we get the files back in the same format that we sent. Since my content is in kramdown Markdown, translating the Markdown source format would be ideal, but translating HTML isn’t a deal-breaker either. However, here I should note that just saying Markdown is the source format hardly scratches the surface. If Markdown is your only source format (and you just have a collection of static files as your source), it would be very difficult to handle translation. You need a more robust tool to handle dynamic content, which is where a static site generator like Jekyll becomes essential. Notice I used the word “dynamic” in the last sentence. There’s somewhat of a misnomer about static site generators. In your source content, you aren’t working just with static content, because if you were, translation would be extremely difficult. In Discover Meteor, the authors explain that static is really more dynamic than we typically credit it as being. They note, A classic Ruby or PHP app can be said to be dynamic because it can react to various parameters on each request (for example, variables passed through the URL). True, static HTML files can’t do that. But your static site generator can still take into account parameters during the build process. In other words, static sites are only static after they have been generated. (See Three More Ways To Make Your Static Sites Smarter) The ability to use variables and parameters in your source is essential when setting up translation to multiple languages. It’s the ability to use these parameters, variables, and other dynamic techniques during the build process – before the files