Parsing is entirely procedural, driven by compile-time execution of parse functions, which either are written by the user or are predefined but could have been written by the user.
In addition to low-level, procedural parse functions, parsing can be controlled by macros. A macro is a prefix or infix operator that controls parsing of the source code to its right. That parsing can be written directly in procedural code or can be controlled by a higher-level, more declarative pattern. See Macros for a description of macros. See Patterns for a description of declarative parsing.
The parse function for syntactic construct xyz is named parse_xyz in the same context as xyz. A parse function accepts four parameters:
Name | Type | Purpose |
---|---|---|
lexer | token_stream | source of tokens |
indentation | integer | indentation at start of current construct |
scope | scope | current scope |
required? | boolean | if first token doesn't match, true => error occurs, false => return false |
See Token Streams for information about objects that can be the lexer parameter.
parse_expression accepts two additional, optional parameters:
Name | Default | Type | Purpose |
---|---|---|---|
precedence | -1 | -1..99 | right precedence of operator preceding the expression |
modifiers | set![name]() | set![name] | names of modifiers preceding the expression |
The compiler provides several utility functions for use by parse functions:
(x name) = (y name) => booleanTwo names are equal if they must refer to the same definition when used in the same scope. In other words, their spellings are the same and their contexts are the same. Because a name is interned in its context, = is the same as eq for names.
same_spelling?(x name, y name) => boolean same_spelling?(x everything, y everything) => falseTrue if the two names' spellings are the same, i.e. the names are the same if we ignore hygienic context. False if either argument is not a name. Use this to compare absolute particles.
punctuation?(x name) => boolean punctuation?(x newline) => true punctuation?(x everything) => falseTrue if x is punctuation.
match?(lexer token_stream, token) => booleanIf next(lexer) matches token then advance lexer and return true, otherwise leave lexer unchanged and return false. Matching uses same_spelling? for names and = for everything else.
match!(lexer token_stream, token)If next(lexer) matches token then advance lexer and return true, otherwise a wrong_token_error occurs. Matching uses same_spelling? for names and = for everything else.
match(lexer token_stream, token, error? boolean) => booleanCall match? or match! depending on error?.
match_definition?(lexer token_stream, value, scope) => booleanIf next(lexer) is a name bound to a known_definition in scope whose value equals value then advance lexer and return true, otherwise leave lexer unchanged and return false. This is how you match an optional $name in a pattern.
match_definition!(lexer token_stream, value, scope) => booleanIf next(lexer) is a name bound to a known_definition in scope whose value equals value then advance lexer and return true, otherwise a wrong_token_error occurs. This is how you match a required $name in a pattern.
match_definition(lexer token_stream, value, scope, error? boolean) => booleanCall match_definition? or match_definition! depending on error?.
match_newline?(lexer token_stream, indentation integer, indent? boolean) => integer | falseIf next(lexer) is a newline and the newline's indentation matches, then advance lexer and return the newline's indentation. Otherwise leave lexer unchanged and return false. If indent? is false, a newline matches if its indentation exactly equals indentation, otherwise a newline matches if its indentation is greater than indentation.
match_newline!(lexer token_stream, indentation integer, indent? boolean)If next(lexer) is a newline and the newline's indentation matches as in match_newline?, then advance lexer and return the newline's indentation. Otherwise a wrong_token_error occurs.
match_newline(lexer token_stream, indentation integer, indent? boolean, error? boolean) => booleanCall match_newline? or match_newline! depending on error?.
match_after_newline?(lexer token_stream, token, indentation integer, indent? boolean) => booleanIf lexer's next two tokens are a newline whose indentation matches as in match_newline? and something that matches token as in match?, then advance lexer past both tokens and return true. Otherwise leave lexer unchanged and return false.
match_after_newline!(lexer token_stream, token, indentation integer, indent? boolean)If lexer's next two tokens are a newline whose indentation matches as in match_newline? and something that matches token as in match?, then advance lexer past both tokens and return true. Otherwise a wrong_token_error occurs.
match_after_newline(lexer token_stream, token, indentation integer, indent? boolean, error? boolean) => booleanCall match_after_newline? or match_after_newline! depending on error?.
wrong_token_error(lexer token_stream, expected)Signal a parsing error. The error message indicates that expected was expected but next(lexer) was seen. The expected argument can be a string, a name, or a sequence of names or strings. The lexer argument also supplies the source_location.
wrong_token_after_newline_error(lexer token_stream, expected, indentation integer, indent? boolean)Signal a parsing error. The error message indicates that expected was expected after a newline described by indentation and indent? but the newline was not present or the next token after the newline was not expected. The expected argument can be a string, a name, or a sequence of names or strings. The lexer argument also supplies the source_location.
The input to the parser comes from a token stream which is an input stream of tokens. A token stream takes its input from a character stream, skips comments, tracks indentation and source locations, and uses lexical analysis to divide the input into tokens. A token stream recognizes \ at the end of a line and suppresses both the \ and the following newline token.
Indentation is the column position, starting from 0 at the left margin, where a token begins. This assumes a fixed-width font.
The source location of a token is the source file and line number where the token appears. The representation of a source file is unspecified; it might be its name as a string.
Token streams also provide a push back buffer which allows parse functions to look ahead arbitrarily far and allows the expansion of a macro to be fed back into the parser. When reading from the push back buffer, newlines have a specific indentation but other tokens just increase the current indentation by 1.
A token output by a token stream is more general than a source code token. It can be a name (including punctuation), a literal, a newline, a keyword, or an expression that has already been parsed.
The interface is
def literal = number | character | string def expression = name | literal | quotation | compound_expression | block_scope def token = expression | newline | keyword defclass token_stream(source in_stream[character], optional: source_file = false, indentation = 0 integer) in_stream[token] ; slots ... ;; The in_stream protocol require more?(s token_stream) => boolean ; true if end not reached require next(s token_stream) => token | false ; peek at next token, false at end require next!(s token_stream) => token | false ; return next token and advance require close(s token_stream) ; release resources ;; Special newline-skipping look ahead ;; ;; If the next token is a newline whose indentation ;; is greater than specified if indent? else exactly as specified, ;; return the token after it, ;; otherwise the same as next. require next_after_newline(s token_stream, indentation integer, indent? boolean) => token | false ;; Get or set the indentation of the next token require (s token_stream).indentation => integer ; indentation of next token require (s token_stream).indentation := (new_indentation integer) ; reset indentation ;; Source location support require (s token_stream).source_location => source_location ; location of next token ;; The push back buffer ;; ;; Insert one or more tokens after what has already been passed over by next!. ;; and before the next token that would have been returned by next or next!. ;; The first token inserted will be the result of next or next!. ;; If a sequence is supplied, it can have any member type but ;; its members must be tokens or source_locations. require insert!(s token_stream, t token) require insert!(s token_stream, ts sequence)
The special token types mentioned above could have been defined by
constant: defclass quotation(datum) constant: defclass newline(indentation integer) constant: defclass keyword(name name) constant: defclass source_location(source_file, line_number integer)
Previous page Table of Contents Next page