Programming Language for Old Timers


by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.


Previous page   Table of Contents   Next page


Pattern Syntax

A macro or anything else that parses can perform the parsing directly as a PLOT program that reads tokens from a token-stream. However it is often more convenient to use a pattern to do the parsing. The function parse-pattern parses a pattern specification as a sequence of tokens into a pattern object. The function translate-pattern translates a pattern object into an expression that reads tokens from a token-stream, defines the names of pattern variables as the results of parsing portions of the input, and executes a specified expression in the scope of those definitions. The last argument to translate-pattern is the expression to execute instead if the input from the token-stream fails to match the pattern. These function definitions are in the compiler module.

Note that all parsing is by recursive descent. Not the most sophisticated, but easy to understand and to extend.

In a pattern the following have special meaning, while other non-newline tokens merely match themselves (in other words, names match absolute particles, keywords and literals match themselves.) Newlines in patterns are ignored. Note that a name in a pattern will match a name with the same spelling, regardless of context or module.

{ ... } grouping (to delimit scope of |)
{ ... }* repeat the "..." subpattern zero or more times
{ ... }+ repeat the "..." subpattern one or more times
| separates alternatives, higher binding power than [ ] and { }
[ ... ] the "..." subpattern is optional. This is an abbreviation for { ... | }
[ ... ]* repeat the "..." subpattern zero or more times, but each alternative can appear at most once
[ ... ]+ repeat the "..." subpattern one or more times, but each alternative can appear at most once
& introduces the separator part of a subpattern; higher binding power than |, [ ], and { }. A separator is part of a construct that only appears if the construct does not end at this point. In a repeated subpattern, if the separator does not match then the repeat stops. In a non-repeated subpattern (typically used inside of [ ]), if the separator does not match then it establishes a parsing barrier, thus the remainder of the parent of the subpattern can only match if it is all optional.
^ consistent, indented newline. This was explained in the Newlines section above.
^^ inconsistent, indented newline. This was explained in the Newlines section above.
^= consistent, non-indented newline. This was explained in the Newlines section above. A name, keyword, or literal must be the next pattern token.
^^= inconsistent, non-indented newline. This was explained in the Newlines section above. A name, keyword, or literal must be the next pattern token.
? followed by a name, and optionally the bound particle is and the name of a syntactic type. The syntactic type defaults to expression. Parses that syntactic type and defines the name to the result. The parsing function invoked is the definition of parse-xxx where xxx is the syntactic type name and the context of parse-xxx is the same as the context of xxx.
?: followed by a name, whose syntactic type has the same name. In other words, "?:x" is short for "?x is x".
?= followed by a name, matches a name whose known definition in the scope of the macro call is the same as the value of the pattern name in the scope where the pattern appears. In other words, this matches a bound particle.
?? followed by a name, defines the name to true if the subpattern after the name matches.
=> not valid in a pattern, so the pattern ends here.
\ prevents the next token from being recognized as one of the special ones listed here.

When there are alternatives, the first alternative whose first pattern element (other than ??) matches is the alternative that is chosen. Since there is no backtracking and only one-token lookahead, patterns with alternatives must be ordered so the most "specific" alternative is first.

The separator part of a repeated subpattern, following &, cannot contain the above special tokens (except ^ ^^ ^= ^^= \).

When ? is used inside of a repeat, the name is defined as a sequence of values (one level of nested sequence for each level of nested repeat).

When ? is used inside of an alternative, the name is defined as false if the chosen alternative does not define the name.


Previous page   Table of Contents   Next page