⋔ about ⋔ contact ⋔ donate ⋔ license ⋔ archives ⋔
924 words by attila written on 2015-03-11, last edit: 2015-10-13, tags: markdown, open-source, openbsd, ports, text ⋔ Previous post: Fixed ⋔ Next post: Adventures in Ports: Tor Browser
Even in just doing the simple stuff I've tried so far in OpenBSD ports it is frequently an adventure that leads to good and unexpected places. I believe they call that fun or something.
My first idea was to do an OpenBSD port of Fletcher Penny's multimarkdown, a markdown derivative. It adds footnotes, tables and metadata. The latter is especially important because it allows for much greater flexibility and power when translating from !MultiMarkdown into other formats. When used with the memoir LaTeX environment you can do some pretty sophisticated stuff. For instance, here's my skeletal template for a book written in MultiMarkdown:
Title: Am I There? Author: Portnoy Semantic Date: DD-MMM-YYYY Base Header Level: 2 latex mode: book latex input: mmd-tufte-book-header latex input: mmd-tufte-book-begin-doc latex input: mmd-tufte-book-footer # First Glance # A snark glowered in the window. Ravens crowed and crowed until the sun came out, then went away. All was darkness. The cats clinked and clinked as they patted each other with great affection and sharp claws. ## A Moment ## A moment in time. I guess this is literature... ## Another ## Another sibling moment in time. # Second Heart # "Ipsum lorem" cried the potters. ## The Moon ## Rising, and yet setting. I'm upside-down.
The stuff at the top is MultiMarkdown's way of letting you attach
arbitrary metadata to your files, but in this case it is not
arbitrary. It is input to the LaTeX generation back end in the
multimarkdown tool. By issuing the commands:
$ multimarkdown -t latex -o book.tex book.md $ pdflatex book.tex
I end up generating
this PDF. Of course I
have to arrange to get a few CTAN modules installed and my
set up properly. I'm certainly no
LaTeX wizard, but it was a hell
of a lot nicer to just drop a few text files in the right place than
screw around with an all-singing, all-dancing GUI that makes my
machine foam at the mouth. The end result is pretty reasonable
considering how light the markup is.
Okay, so great, I do this port. In the process of doing it I run into Parsing Expression Grammars, and that's really what this post is about. MultiMarkdown uses a PEG for its parser; it wanted a parser generator called greg but he decided to include it as a submodule in his repository. OpenBSD ports doesn't do git submodules right now (maybe never I hope since they are kind of evil) so I decided the best idea was to do a greg port first, to support the port of MultiMarkdown. This dragged me into the fascinating world of PEGs, totally on a tangent.
Parsing expression grammars are an alternate formalism to context-free grammars first published by Bryan Ford in POPL2004 (pdf). The paper is well worth reading if you are at all interested in parsing, translation and other related subjects: it is short, clear and thorough.
Here's the big idea: all that power in context-free grammars that Chomsky was on about is necessary for parsing human languages but not so much for machine-oriented parsing tasks, like configuration files or programming notations. His proposal in PEGs is to replace the unordered choice operation implicit in CFGs, where the right-hand side of a production is a set (inherently unordered) with an ordered choice primitive, where the RHS is a list and the first match wins. He uses forward-slash for his prioritized choice operator:
... The EBNF rules `A -> a | a b` and `A -> a b | a` are equivalent in a CFG, but the PEG rules `A <- a b / a` and `A <- a / a b` are different. The second alternative in the latter PEG rule will never succeed because the first choice is always taken if the input string to be recognized begins with *a*.
Combined with some regular expression-like features the resulting notation he proposes as a concrete syntax for PEGs is remarkably powerful. It allows one to do away completely with a separate lexical analyzer; instead, a PEG describes both the hierarchical syntax and the lexical syntax of a language in a single spot. The resulting parsers are linear in time, have predictable memory requirements (no leaks!) and are, at least in my experience, much easier to debug than the standard stack of stuff we've become used to.
There's nothing not to like about this. In fact this is the first bit
of computer science that really piqued my interest in a very long
time... not that I'm all that great about keeping up on things, but
PEGs felt like a breath of fresh air. I really want to go back and
revisit some of my old parsers (all written with Yacc, Bison or Lemon
combined with flex). The
greg tool is pretty good, although I
want to look at its output in more detail and do some more testing.
Anyone who has suffered through writing a parser of any complexity using the standard tools should consider checking greg out. It was accepted into ports in January 2015 (my first!) and I hope to do some cool things with it.
Frequently all the fun is on the tangents. In fact after I wrote this I realized I had forgotten to pop back up a level and see what was up with the multimarkdown port. It will become evident after my next post what other tangent distracted me...
Copyright © 1999-2022 by attila <email@example.com>. All Rights Reserved.