Lunar Programming Language

by David A. Moon
January 2017 - January 2018



Parsing

Parsing is entirely procedural, driven by compile-time execution of parse functions, which either are written by the user or are predefined but could have been written by the user.

In addition to low-level, procedural parse functions, parsing can be controlled by macros. A macro is a prefix or infix operator that controls parsing of the source code to its right. That parsing can be written directly in procedural code or can be controlled by a higher-level, more declarative pattern. See Macros for a description of macros. See Patterns for a description of declarative parsing.

The parse function for syntactic construct xyz is named parse_xyz in the same context as xyz. A parse function accepts four parameters:

Name Type Purpose
lexer token_stream source of tokens
indentation integer indentation at start of current construct
scope scope current scope
required? boolean if first token doesn't match, true => error occurs, false => return false

See Token Streams for information about objects that can be the lexer parameter.

parse_expression accepts two additional, optional parameters:

Name Default Type Purpose
precedence -1 -1..99 right precedence of operator preceding the expression
modifiers set![name]() set![name] names of modifiers preceding the expression

The compiler provides several utility functions for use by parse functions:

(x name) = (y name) => boolean
Two names are equal if they must refer to the same definition when used in the same scope. In other words, their spellings are the same and their contexts are the same. Because a name is interned in its context, = is the same as eq for names.

same_spelling?(x name, y name) => boolean
same_spelling?(x everything, y everything) => false
True if the two names' spellings are the same, i.e. the names are the same if we ignore hygienic context. False if either argument is not a name. Use this to compare absolute particles.

punctuation?(x name) => boolean
punctuation?(x newline) => true
punctuation?(x everything) => false
True if x is punctuation.

match?(lexer token_stream, token) => boolean
If next(lexer) matches token then advance lexer and return true, otherwise leave lexer unchanged and return false. Matching uses same_spelling? for names and = for everything else.

match!(lexer token_stream, token)
If next(lexer) matches token then advance lexer and return true, otherwise a wrong_token_error occurs. Matching uses same_spelling? for names and = for everything else.

match(lexer token_stream, token, error? boolean) => boolean
Call match? or match! depending on error?.

match_definition?(lexer token_stream, value, scope) => boolean
If next(lexer) is a name bound to a known_definition in scope whose value equals value then advance lexer and return true, otherwise leave lexer unchanged and return false. This is how you match an optional $name in a pattern.

match_definition!(lexer token_stream, value, scope) => boolean
If next(lexer) is a name bound to a known_definition in scope whose value equals value then advance lexer and return true, otherwise a wrong_token_error occurs. This is how you match a required $name in a pattern.

match_definition(lexer token_stream, value, scope, error? boolean) => boolean
Call match_definition? or match_definition! depending on error?.

match_newline?(lexer token_stream, indentation integer, indent? boolean) => integer | false
If next(lexer) is a newline and the newline's indentation matches, then advance lexer and return the newline's indentation. Otherwise leave lexer unchanged and return false. If indent? is false, a newline matches if its indentation exactly equals indentation, otherwise a newline matches if its indentation is greater than indentation.

match_newline!(lexer token_stream, indentation integer, indent? boolean)
If next(lexer) is a newline and the newline's indentation matches as in match_newline?, then advance lexer and return the newline's indentation. Otherwise a wrong_token_error occurs.

match_newline(lexer token_stream, indentation integer,
              indent? boolean, error? boolean) => boolean
Call match_newline? or match_newline! depending on error?.

match_after_newline?(lexer token_stream, token, indentation integer, indent? boolean) => boolean
If lexer's next two tokens are a newline whose indentation matches as in match_newline? and something that matches token as in match?, then advance lexer past both tokens and return true. Otherwise leave lexer unchanged and return false.

match_after_newline!(lexer token_stream, token, indentation integer, indent? boolean)
If lexer's next two tokens are a newline whose indentation matches as in match_newline? and something that matches token as in match?, then advance lexer past both tokens and return true. Otherwise a wrong_token_error occurs.

match_after_newline(lexer token_stream, token, indentation integer,
                    indent? boolean, error? boolean) => boolean
Call match_after_newline? or match_after_newline! depending on error?.

wrong_token_error(lexer token_stream, expected)
Signal a parsing error. The error message indicates that expected was expected but next(lexer) was seen. The expected argument can be a string, a name, or a sequence of names or strings. The lexer argument also supplies the source_location.

wrong_token_after_newline_error(lexer token_stream, expected,
                                indentation integer, indent? boolean)
Signal a parsing error. The error message indicates that expected was expected after a newline described by indentation and indent? but the newline was not present or the next token after the newline was not expected. The expected argument can be a string, a name, or a sequence of names or strings. The lexer argument also supplies the source_location.

Token Streams

The input to the parser comes from a token stream which is an input stream of tokens. A token stream takes its input from a character stream, skips comments, tracks indentation and source locations, and uses lexical analysis to divide the input into tokens. A token stream recognizes \ at the end of a line and suppresses both the \ and the following newline token.

Indentation is the column position, starting from 0 at the left margin, where a token begins. This assumes a fixed-width font.

The source location of a token is the source file and line number where the token appears. The representation of a source file is unspecified; it might be its name as a string.

Token streams also provide a push back buffer which allows parse functions to look ahead arbitrarily far and allows the expansion of a macro to be fed back into the parser. When reading from the push back buffer, newlines have a specific indentation but other tokens just increase the current indentation by 1.

A token output by a token stream is more general than a source code token. It can be a name (including punctuation), a literal, a newline, a keyword, or an expression that has already been parsed.

The interface is

def literal = number | character | string
def expression = name | literal | quotation | compound_expression | block_scope
def token = expression | newline | keyword

defclass token_stream(source in_stream[character],
                      optional: source_file = false,
                                indentation = 0 integer)    in_stream[token]
  ; slots ...

;; The in_stream protocol

require more?(s token_stream) => boolean          ; true if end not reached
require next(s token_stream) => token | false     ; peek at next token, false at end
require next!(s token_stream) => token | false    ; return next token and advance
require close(s token_stream)                     ; release resources

;; Special newline-skipping look ahead
;;
;; If the next token is a newline whose indentation
;; is greater than specified if indent? else exactly as specified,
;; return the token after it,
;; otherwise the same as next.

require next_after_newline(s token_stream, indentation integer, indent? boolean) => token | false

;; Get or set the indentation of the next token

require (s token_stream).indentation => integer   ; indentation of next token
require (s token_stream).indentation := (new_indentation integer)  ; reset indentation

;; Source location support

require (s token_stream).source_location => source_location  ; location of next token

;; The push back buffer
;;
;; Insert one or more tokens after what has already been passed over by next!.
;; and before the next token that would have been returned by next or next!.
;; The first token inserted will be the result of next or next!.
;; If a sequence is supplied, it can have any member type but
;; its members must be tokens or source_locations.

require insert!(s token_stream, t token)
require insert!(s token_stream, ts sequence)

The special token types mentioned above could have been defined by

constant:
defclass quotation(datum)

constant:
defclass newline(indentation integer)

constant:
defclass keyword(name name)

constant:
defclass source_location(source_file, line_number integer)


Previous page   Table of Contents   Next page



Creative Commons License
Lunar by David A. Moon is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Please inform me if you find this useful, or use any of the ideas embedded in it.
Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.