Programming Language for Old Timers

by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.

Previous page   Table of Contents   Next page


There are two special considerations for newlines in the syntax. The purpose of this is to allow nesting structure to be indicated by indentation rather than by any kind of explicit bracketing. Except for these special considerations parsing generally skips over newline tokens.

1) The first token on a line will not be recognized as an infix operator. Therefore an expression cannot continue past a newline unless the line ends with an infix operator, or the newline is inside of a macro invocation rather than directly inside of an expression. This allows two expressions to be separated by only a newline without syntactic ambiguity, even when the second expression begins with a token that is both a prefix operator and an infix operator.

2) The program syntax contains an element called a "line break," indicated by ^ in the syntax pattern language. This indicates a place in the syntax where a newline is expected, but not required. For example, the body of a method consists of a sequence of expressions, with a line break before each expression. Most methods are written with each expression in the body on a separate line. A large expression can continue onto multiple lines, but there is always a newline between the end of one expression and the start of the next. However, a method with only one expression in its body could be written all on one line. You cannot have consecutive expressions in a method body on the same line; even when it would be syntactically unambiguous it is difficult to read, so it is not allowed.

The specific syntax rules for line breaks are as follows:

A line break has a scope, which is the enclosing syntactic construct, such as one expression in a method's body. In the syntax pattern language this is indicated by enclosing the line break's scope in braces { }.

A token-stream has a current indentation, which is initially zero at the start of a program. A line break sets the current indentation to the indentation of the next token if that token is a newline, otherwise to the current indentation plus 1. After the scope of the line break the current indentation is restored to its previous value.

A line break establishes a parsing barrier for its scope, starting just before the first newline that is indented less than the current indentation set by the line break. Tokens behind the parsing barrier cannot be seen while parsing is within the scope. The barrier acts similarly to end-of-file.

A line break is usually contained in a syntactic repeat; in the syntax pattern language the scope-enclosing braces are followed by * or +. On the first repetition, the line break matches any token not behind a parsing barrier. The line break consumes the matching token only if it is a newline. The current indentation is adjusted as described above.

On subsequent repetitions, the line break matches if the next token is a newline and its indentation is the same as the current indentation. It consumes the newline. It fails to match if a parsing barrier has been reached, i.e. the next token is a newline and its indentation is less than the current indentation; this indicates the end of a nested construct. It also fails to match if the next token is not a newline; there will be an error check at the end of the line break's scope as noted below. If the next token is a newline whose indentation is greater than current, a syntax error occurs, because a continuation line is not possible here.

These rules imply that indentation more than current indicates either a continuation line or the start of a nested construct such as a method body or a class definition body. Such a nested construct ends when indentation returns to its previous value.

If the first repetition consumed a newline at the line break, there must be a newline after the scope of a line break ends. This newline cannot be the start of a continuation line; in other words, this newline's indentation must be less than or equal to the indentation of the lines in the scope of the line break.

If the first repetition did not consume a newline, then the scope started in the middle of a line and can end in the middle of a line. In this case it is necessary to have only one repetition.

Previous page   Table of Contents   Next page