Paul's Internet Landfill/ lj/ LaTeX and open source

LaTeX and open source

I wonder why people don't trumpet LaTeX more in the open source world? It is a textbook example of how the movement can work well.

Consider: Knuth wrote some software called TeX that scratched his itch. It was a well-designed, flexible typesetting system. But most people neither think nor want to think like typesetters when writing essays, it would have been of limited interest had the story stopped there.

Fortunately, TeX supported a macro language, so Lamport was able to write the set of structural markup macros that became LaTeX. So far, open source doesn't enter the picture: any typesetting system that supported macros could have provided similar functionality. LaTeX could even have been a closed-source typesetting system distributed for free over the Internet, which would have addressed the market penetration issue.

Unfortunately, LaTeX is not perfect. It's pretty flexible, and provides lots of parameters so you can tune a lot of things, but you can't do everything. I see two reasons for this:

0. Some functionality (such as verbatim environments) is pretty broken and/or inflexible.

1. Lamport did not write macro sets for all the uses to which a structural typesetting system could be used (e.g. resumes, chess games, or music).

Here is where the magic of open source comes in. Because LaTeX macros are somewhat readable, other people could read and learn from them, to better understand how LaTeX works and does not work. Fortunately, the niche market for LaTeX included lots of computer geeks, so when people ran into deficiencies of the language, they were willing to write workaround packages for the problems -- often by creating derivative works of the original macro sets. Then they had the freedom to release these packages, so non-geeks like me could apply the workarounds when necessary. Because the packages could be released freely, people could make LaTeX distributions and set up the Comprehensive TeX Archive Network (CTAN), which made it trivial to get a working LaTeX environment with the extra packages.

Finally, people were free to write documentation for the package and release it freely, even though Lamport had written a good book on the subject already. This reduced the barrier to entry: how many of us learned LaTeX by reading the "Not so short" guide? How many of us would have gone out and bought the book just to get started? Maybe lots, but I probably would not have been one of them.

So now you have a base software package that's mostly excellent, and a whole array of easily-available solutions and add-ons. The critical mass of users means that if you have run into a problem, somebody smarter than you probably ran into it too, which means you can just include and use a helper package.

Like so many other open-source programs, it takes some effort to get started with the language -- but it doesn't take too long, and before long you are able to produce good-looking documents easily. That keeps you using the software so you can learn the harder stuff.

Here is a critical point: without this body of extra packages and compiled knowledge, LaTeX would be unusable. Even if LaTeX does 95% of its job perfectly, as your projects grow larger the probability grows that the extra 5% will kill you. In every large project I write I end up twiddling the typesetting -- but LaTeX is flexible enough to allow me to do that, and when I don't know how to do it manually somebody else has solved my problem for me.

Today I needed to create some unorthodox lists: they needed to be tightly-packed to save valuable resume real-estate. Unfortunately LaTeX likes to put a space (called \topsep) between a paragraph and the bullet-list that follows it. Fortunately for me, the LaTeX distro I comes with documentation, so I was able to find a solution without needing to go to the Internet. If TeX and LaTeX had been closed- source, I probably would not have been so lucky: the distro would have come with some "official" documentation which would have not contained my answer. If I was lucky I would find the answer on the Internet or in a book somewhere ("LaTeX Secrets Revealed!"). If I wasn't lucky I would not have figured out how to solve my problem. (And if Lamport or Knuth had been snarky and prevented me from twiddling lengths to protect me from my own bad typesetting, it would have been impossible to solve the problem.)

There's more than just open source going on here: there's a culture of open source in the project, so people are more likely to contribute their knowledge and packages to the general pool. I don't think there is anything prevents people from setting up a Comprehensive Office Archive Network (COAN) full of helpful documentation, useful VB macros, and extensions for Office products. But as far as I know such a network does not exist because the culture doesn't exist. People run into similar problems, because Office is not perfect. But instead of releasing those solutions so everybody can benefit, they either release dorky shareware or don't release anything at all. (Okay. Some people do release neat Word macros for free, but viruses don't count.) In the Office world there is much less of a sense that people can solve their own problems, and that everybody is better off if they release those solutions to others.

Because this entry is not long or boring enough, here's one more example: pdfTeX. I don't know whether Adobe had invented PDF format when Knuth was writing TeX, but it certainly was not the first choice for distributing electronic documents. A few years it became one, so Han the Thanth wrote pdfTeX -- which no doubt borrowed liberally from Knuth's sources. pdfTeX is pretty close to perfect, and it is pretty useful. What chance would have there been of such a good port if the source had not been available?

My final comment has nothing to do with open source: literate programming works. I wish it had caught on as a widestream programming methodology. To some degree it has: Javadoc and executable Python comments are examples of it, but we could usefully take it further.

So there. Open source good. I don't think that LaTeX's example generalizes well, but it's a good example nonetheless.

Livejournal URL:

Mood: Not specified