literate programming, Epsilon Clue

Hacking Things I've Learned

Literate Lists

I’ve written before about literate programming, and how one of its most attractive features is that you can write code with the primary goal of conveying information to a person, and only secondarily of telling a computer what to do. So there’s a bit in my .bashrc that adds directories to $PATH that isn’t as reader-friendly as I’d like:

for dir in \
    /usr/sbin \
    /opt/sbin \
    /usr/local/sbin \
    /some/very/specific/directory \
    ; do
    PATH="$dir:$PATH"
done

I’d like to be add a comment to each directory entry, explaining why I want it in $PATH, but sh syntax won’t let me: there’s just no way to interleave strings and comments this way. So far, I’ve documented these directories in a comment above the for loop, but that’s not exactly what I’d like to do. In fact, I’d like to do something like:

$PATH components

/usr/sbin

/usr/local/bin
for dir in \
    {{path-components}} \
    ; do
    PATH="$dir:$PATH"
done

Or even:

$PATH components

Directory Comments
/usr/sbin sbin directories contain sysadminny stuff, and should go before bin directories.
/usr/local/bin Locally-installed utilities take precedence over vendor-installed ones.
for dir in \
    {{path-components}} \
    ; do
    PATH="$dir:$PATH"
done

Spoiler alert: both are possible with org-mode.

Lists

The key is to use Library of Babel code blocks: these allow you to execute org-mode code blocks and use the results elsewhere. Let’s start by writing the code that we want to be able to write:

#+name: path-list
- /usr/bin
- /opt/bin
- /usr/local/bin
- /sbin
- /opt/sbin
- /usr/local/sbin

#+begin_src bash :noweb no-export :tangle list.sh
  for l in \
      <<org-list-to-sh(l=path-list)>> \
      ; do
      PATH="$l:$PATH"
  done
#+end_src

Note the :noweb argument to the bash code block, and the <<org-list-to-sh()>> call in noweb brackets. This is a function we need to write. It’ll (somehow) take an org list as input and convert it into a string that can be inserted in this fragment of bash code.

This function is a Babel code block that we will evaluate, and which will return a string. We can write it in any supported language we like, such as R or Python, but for the sake of simplicity and portability, let’s stick with Emacs lisp.

Next, we’ll want a test rig to actually write the org-list-to-sh function. Let’s start with:

#+name: org-list-to-sh
#+begin_src emacs-lisp :var l='nil
  l
#+end_src

#+name: test-list
- First
- Second
- Third

#+CALL: org-list-to-sh(l=test-list) :results value raw

The begin_src block at the top defines our function. For now, it simply takes one parameter, l, which defaults to nil, and returns l. Then there’s a list, to provide test data, and finally a #+CALL: line, which contains a call to org-list-to-sh and some header arguments, which we’ll get to in a moment.

If you press C-c C-c on the #+CALL line, Emacs will evaluate the call and write the result to a #+RESULTS block underneath. Go ahead and experiment with the Lisp code and any parameters you might be curious about.

The possible values for the :results header are listed under “Results of Evaluation” in the Org-Mode manual. There are a lot of them, but the one we care the most about is value: we’re going to execute code and take its return value, not its printed output. But this is the default, so it can be omitted.

If you tangle this file with C-c C-v C-t, you’ll see the following in list.sh:

for l in \
    ((/usr/bin) (/opt/bin) (/usr/local/bin) (/sbin) (/opt/sbin) (/usr/local/sbin)) \
    ; do
    PATH="$l:$PATH"
done

It looks as though our org-mode list got turned into a Lisp list. As it turns out, yes, but not really. Let’s change the source of the org-list-to-sh() function to illustrate what’s going on:

#+name: org-list-to-sh
#+begin_src emacs-lisp :var l='nil :results raw
  (format "aaa %s zzz" l)
#+end_src

Now, when we tangle list.sh, it contains

    aaa ((/usr/bin) (/opt/bin) (/usr/local/bin) (/sbin) (/opt/sbin) (/usr/local/sbin)) zzz \

So the return value from org-list-to-sh was turned into a string, and that string was inserted into the tangled file. This is because we chose :results raw in the definition of org-list-to-sh. If you play around with other values, you’ll see why they don’t work: vector wraps the result in extraneous parentheses, scalar adds extraneous quotation marks, and so on.

Really, what we want is a plain string, generated from Lisp code and inserted in our sh code as-is. So we’ll need to change the org-list-to-sh code to return a string, and use :results raw to insert that string unchanged in the tangled file.

We saw above that org-list-to-sh sees its parameter as a list of lists of strings, so let’s concatenate those strings, with space between them:

#+name: org-list-to-sh
#+begin_src emacs-lisp :var l='nil :results raw
  (mapconcat 'identity
	     (mapcar
	      (lambda (elt)
		(car elt)
		)
	      l)
	     " ")
#+end_src

This yields, in list.sh:

for l in \
    /usr/bin /opt/bin /usr/local/bin /sbin /opt/sbin /usr/local/sbin \
    ; do
    PATH="$l:$PATH"
done

which looks pretty nice. It would be nice to break that list of strings across multiple lines, and also quote them (in case there are directories with spaces in them), but I’ll leave that as an exercise for the reader.

Tables

That takes care of converting an org-mode list to a sh string. But earlier I said it would be even better to define the $PATH components in an org-mode table, with directories in the first column and comments in the second. This is easy, with what we’ve already done with strings. Let’s add a test table to our org-mode code, and some code to just return its input:

#+name: echo-input
#+begin_src emacs-lisp :var l='nil :results raw
  l
#+end_src

#+name: test-table
| *Name*   | *Comment*        |
|----------+------------------|
| /bin     | First directory  |
| /sbin    | Second directory |
| /opt/bin | Third directory  |

#+CALL: echo-input(l=test-table) :results value code

#+RESULTS:

Press C-c C-c on the #+CALL line to evaluate it, and you’ll see the results:

#+RESULTS:
#+begin_src emacs-lisp
(("/bin" "First directory")
 ("/sbin" "Second directory")
 ("/opt/bin" "Third directory"))
#+end_src

First of all, note that, just as with lists, the table is converted to a list of lists of strings, where the first string in each list is the name of the directory. So we can just reuse our existing org-list-to-sh code. Secondly, org has helpfully stripped the header line and the horizontal rule underneath it, giving us a clean set of data to work with (this seems a bit fragile, however, so in your own code, be sure to sanitize your inputs). Just convert the list of directories to a table of directories, and you’re done.

Conclusion

We’ve seen how to convert org-mode lists and tables to code that can be inserted into a sh (or other language) source file when it’s tangled. This means that when our code includes data best represented by a list or table, we can, in the spirit of literate programming, use org-mode formatting to present that data to the user as a good-looking list or table, rather than just list it as code.

One final homework assignment: in the list or table that describes the path elements, it would be nice to use org-mode formatting for the directory name itself: =/bin= rather than /bin. Update org-list-to-sh to strip the formatting before converting to sh code.

Andrew Arensburger

Dec, Fri, 2022

emacs Geek Hacking Things I've Learned

A Few More Thoughts on Literate Programming

A while back, I became intrigued by Donald Knuth’s idea of Literate Programming, and decided to give it a shot. That first attempt was basically just me writing down what I knew as quickly as I learned it, and trying to pass it off as a knowledgeable tutorial. More recently, I tried a second project, a web-app that solves Wordle, and thought I’d write it in the Literate style as well.

The first time around, I learned the mechanics. The second time, I was able to learn one or two things about the coding itself.

(For those who don’t remember, in literate programming, you write code intertwined with prose that explains the code, and a post-processor turns the result into a pretty document for humans to read, and ugly code for computers to process.

1) The thing I liked the most, the part where literate programming really shines, is having the code be grouped not by function or by class, but by topic. I could introduce a <div class="message-box"></div> in the main HTML file, and in the next paragraph introduce the CSS that styles it, and the JavaScript code that manipulates it.

2) In the same vein, several times I rearranged the source to make the explanations flow better, not discuss variables or functions until I had explained why they’re there and what they do, without it altering the underlying HTML or JavaScript source. In fact, this led to a stylistic quandary:

3) I defined a few customization variables. You know, the kind that normally go at the top for easy customization:

var MIN_FOO = 30;
var MAX_FOO = 1500;
var LOG_FILE = "/var/log/mylogfile.log";

Of course, the natural tendency was to put them next to the code that they affect, somewhere in the middle of the source file. Should I have put them at the top of my source instead?

4) Even smaller: how do you pass command-line option definitions to getopt()? If you have options -a, -b, and -c, each will normally be defined in its own section. So in principle, the literate thing to do would be to write

getopt("{{option-a}}{{option-b}}{{option-c}}");

and have a section that defines option-a as “a“. As you can see, though, defining single-letter strings isn’t terribly readable, and literate programming is all about readability.

5) Speaking of readability, one thing that can come in really handy is the ability to generate a pretty document for human consumption. Knuth’s original tools generated TeX, of course, and it doesn’t get prettier than that.

I used org-mode, which accepts TeX style math notation, but also allows you to embed images and graphviz graphs. In my case, I needed to calculate the entropy of a variable, so being able to use proper equations, with nicely-formatted sigmas and italicized variables, was very nice. I’ve worked in the past on a number of projects where it would have been useful to embed a diagram with circles and arrows, rather than using words or ASCII art.

6) I was surprised to find that I had practically no comments in the base code (in the JavaScript, HTML, and CSS that were generated from my org-mode source file). I normally comment a lot. It’s not that I was less verbose. In fact, I was more verbose than usual. It’s just that I was putting all of the explanations about what I was trying to do, and why things were the way they are, in the human-docs part of the source, not the parts destined for computer consumption. Which, I guess, was the point.

7) Related to this, I think I had fewer bugs than I would normally have gotten in a project of this size. I don’t know why, but I suspect that it was due to some combination of thinking “out loud” (or at least in prose) before pounding out a chunk of code, and of having related bits of code next to each other, and not scattered across multiple files.

8) I don’t know whether I could tackle a large project in this way. You might say, “Why not? Donald Knuth wrote both TeX and Metafont as literate code, and even published the source in two fat books!” Well, yeah, but he’s Donald Knuth. Also, he was writing before IDEs, or even color-coded code, were available.

I found org-mode to be the most comfortable tool for me to use for this project. But of course that effectively prevents people who don’t use Emacs (even though they obviously should) from contributing.

One drawback of org-mode as a literate programming development environment is that you’re pretty much limited to one source file, which obviously doesn’t scale. There are other tools out there, like noweb, but I found those harder to set up, or they forced me to use (La)TeX when I didn’t want to, or the like.

9) One serious drawback of org-mode is that it makes it nearly impossible to add cross-reference links. If you have a section like

function myFunc() {
    var thing;
    {{calculate thing}}
    return thing;
}

it would be very useful to have {{calculate thing}} be a link that you can click on to go to the definition of that chunk. But this is much harder to do in org-mode than it should be. So is labeling chunks, so that people can chase cross-references even without convenient links. It has a lot of work to be done in that regard.

Andrew Arensburger

May, Tue, 2022

Directory	Comments
`/usr/sbin`	`sbin` directories contain sysadminny stuff, and should go before `bin` directories.
`/usr/local/bin`	Locally-installed utilities take precedence over vendor-installed ones.

Epsilon Clue