My Pandoc Setup for My Technical Book
2024-04-11 #pandoc, #latex, 601 Words 2 Mins

I'm working on a technical book about WebGPU, which I intend to self-publish as both a printed book and ebooks. I researched various approaches to creating it. Initially, I planned to use LaTeX with a suitable technical book template to generate a PDF. While a PDF might be ideal for a printed book, it is not a good ebook format, as it does not reflow its text based on screen size. Eventually, I settled on Pandoc, as it can generate PDFs and other digital formats from Markdown files. Additionally, if I later want to create other formats, such as a website, it will not be difficult, as Markdown is a universal format.

However, as I progressed with this book, I realized that a printed book falls behind digital formats when it comes to technical books. Firstly, many resources I want to reference are simply links and videos. For digital formats, embedding them is straightforward, but impossible for a printed book. Secondly, it's also challenging to include source code effectively. Pasting the full code will occupy too many pages, and most of it is boilerplate or initialization that is not crucial to the discussion. Pasting only the relevant parts may fragment the overall picture, as code is not organized linearly. To understand an implementation, we often need to jump between multiple functions and files, which is difficult to handle in a traditional book when the content is predominantly linear.

Moreover, there is an additional challenge: it's difficult to keep the actual sample code and the corresponding code snippets in the book in sync. Fortunately, I found a reasonable solution to remedy this problem.

I'm currently using the excellent Pandoc technical book template. Out of the box, this template supports code blocks with line numbers, relying on the Listings package. This package offers various customization options that can be configured in the eisvogel.latex file:

\usepackage{listings}
\renewcommand{\lstlistingname}{Code Sample}
\lstset{captionpos=b}

For instance, here I've renamed the caption prefix from the default Listing to Code Sample, and positioned the caption below the code block instead of above.

Furthermore, Pandoc allows you to add more options, such as captions and the starting line number, when defining code blocks in Markdown. For example:

``python {startFrom="1" caption="Tile preprocessor"}
import PIL.Image
...
``

This is a super useful feature, as it allows me to specify the precise line numbers as they appear in the source code. However, this approach presents a challenge - every time I update the code, both the line numbers and the code content may change. The question then becomes: how can I keep the code snippets referenced in the book in sync with the actual source code?

To solve this problem, I've introduced a custom code comment syntax that enables me to specify the code segments I want to reference in the book, along with a label:

//cs_start: code_segment_name
...
//cs_end: code_segment_name

I then implemented a preprocessor to parse all the source code files. During this process, I establish a database of code segments and their metadata, including line numbers. Each segment is indexed by its file path as well as the segment label.

In my Markdown files, instead of directly copying and pasting code from the source files, I define "dummy" code blocks that contain the necessary information to look up the relevant code segments in the database:

``yaml
file: 5_04_mega_texture/preprocess.py
id: mega_texture_preprocess
title: Tile preprocessor
lang: python
``

The Markdown files are then processed by a preprocessor that replaces these dummy code blocks with the actual content retrieved from the database. This ensures that the code snippets in the book are always in sync with the source code.

Code Block Example as Shown in My Book
Code Block Example as Shown in My Book

With this approach, I'm able to maintain the precise line numbers for the code blocks, even as the underlying source code evolves over time.

Leave a Comment on Github