Programming Language for Old Timers

by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.

Previous page Table of Contents Next page

Metasyntax

For lexical syntax, see the Lexical Syntax section. After lexical syntax, the syntax of PLOT is defined entirely by recursive descent parse methods written in the language.

All parsing is LL(1) with one small exception involving newlines. Parsing is sensitive to known definitions in scope, which allows extensible operators and macros.

A parse method takes three arguments, a token-stream, the current indentation, and an error check flag. If the parser does not recognize its input, the result depends on the error check flag: the parser returns false if the error flag is false or signals an error if the error flag is true. The error check flag is passed down in head-recursive calls with no alternatives, causing the error message to be generated at the most informative level. The error check flag is true after the first token in a construct since we are committed to that construct, being LL(1).

In the program syntax, there are two ways to reach a parse method. The first way is to parse a specific syntactic type. For example, the syntactic type named expression is the basis of programs, so the compiler would call parse-expression to get a piece of a program. By convention, the name of the function that contains a parse method is the name of the syntactic type preceded by "parse-".

The second way is macros. When the expression parse method sees a name that is defined as a macro, it invokes the parse method for that macro. Macros are the basis of all idiosyncratic syntax in PLOT. In this way, the syntax of the language is defined within the language and can be changed by users. Each module could have its own syntax. (If we didn't want to use the standard expression parser at all in a given module, a new mechanism would be needed. But each module can replace all the statements and operators of PLOT, simply by defining names.) More importantly, since the language is defined within itself, user-defined language extensions and embedded languages can do anything the base language can do. There are no magic constructs.

A macro object is simply a wrapper for a parse method which takes two arguments, a token-stream and the current indentation. The error? argument accepted by other parse methods is always implicitly true for a macro parse method. The parse method parses as many tokens as it likes out of the token-stream and returns a result. If the result is an expression, it is the expansion of the macro. Otherwise the result must be a sequence of tokens. parse-expression is called with a token-stream constructed from that sequence, zero as the current indentation, and an error check flag of true. The result of parse-expression is the expansion of the macro. If the contents of the token-stream is not a valid expression, or there are any tokens left over, an error is signalled.

For convenience, rather than writing a parse method directly in raw imperative form, you can write it in a more declarative, pattern-directed form using one of the macros defparser, defsyntax, and defmacro. These PLOT macros translate the patterns into PLOT code to do the parsing. PLOT patterns are pretty powerful, although not powerful enough to write the parser for expressions, which must be written in imperative form mainly because of operator precedence.

defparser defines a parse method for a syntactic type, using a pattern to specify what is to be parsed and arbitrary PLOT code to specify what object to return. The code generated by the defparser macro takes care of reading from the token-stream and handling the error? argument. Within the body of a defparser, the names tokens and error? are visibly bound to the arguments, and the name macro-context is visibly bound to a new unique context for hygienic macros.

defsyntax is just like defparser except that the parser will also accept a single token that is of the data type with the same name as the syntactic type. This represents an already-parsed instance of the syntactic type. The implicit pattern that accepts a single token of that type goes after all the explicit patterns, in case one of them also accepts such tokens. Furthermore, the parse method will accept such an object in place of a token-stream as its argument. The defsyntax macro generates code to take care of all this special processing. The body of a defsyntax must always return an instance of the data type with the same name as the syntactic type, or false. This allows a P-expression node type to be defined as a PLOT class, and anywhere in the syntax that the syntactic type appears, the parser will accept either tokens that conform to the grammar of that syntactic type, or a P-expression that has already been parsed. This is very useful in connection with macros and templates.

defmacro defines a macro. When the name of a macro appears in an expression, idiosyncratic syntax follows. defmacro defines the name of the macro to be a macro object, which contains a parse method similar to those defined by defparser. The result of the parse method must be an expression or a sequence of tokens that can be parsed into an expression. Often this result is produced by a template, introduced by the ` macro. A template evaluates to the sequence of tokens described by the template.

There are also operator macros, which are explained in the Operators section. An operator macro can be infix or prefix. An infix operator macro that parses no tokens on the right-hand side is effectively postfix.

Unlike defparser and defsyntax, the various forms of macro do not have a visible binding of the name error?.

Previous page Table of Contents Next page