Lunar Programming Language

by David A. Moon
January 2017 - January 2018



Patterns

Parsing can be directed by a pattern which gets translated into the necessary imperative code to do the parsing. Using a pattern can be much more concise.

A pattern consists of pattern-constants, -variables, and -punctuation. Newlines, indentation, and whitespace are insignificant in patterns.

Pattern Constants

A pattern constant matches a single token in the input. There are four kinds of pattern constant:

A string literal "name" matches any name whose spelling is name.

A token pair $name matches any name which has the same definition as the visible definition of name in the scope where the pattern appears. If name is a name_in_context its context is used.

A string literal "name:" matches any keyword whose name's spelling is name.

The names ^, ^^, and ^= match a newline. The difference between these three pattern constants is how they handle indentation:

^ requires indentation greater than the indentation coming into the pattern. Every ^ in a pattern requires exactly the same indentation, which is determined by the first ^ to match.

^^ requires indentation greater than the indentation coming into the pattern but allows a different indentation for each match.

^= requires indentation equal to the indentation coming into the pattern.

Pattern Variables

A pattern variable matches a syntactic construct in the input and locally defines a constant whose name is the name of the pattern variable and whose value is the result of parsing that syntactic construct. This definition is visible in the body associated with the pattern. (But if a pattern variable appears inside { } the value is a sequence of parse results.)

It is an error for the same pattern variable name to appear more than once in a pattern.

A pattern variable is any alphanumeric name. It consists of the name of the syntactic construct, optionally preceded by any characters ending in _, and optionally followed by any number of digit characters. Thus body, body2, body15, and main_body are all pattern variables that match a body. See Syntactic Constructs for a list of syntactic constructs.

Pattern Punctuation

Pattern punctuation defines the structure of a pattern. The following pattern punctuation tokens are available:

$ Same-definition pattern constant
| Alternatives
( ) Grouping
[ ] Optional
{ } Repeat
+ Repeat one or more times
* Repeat zero or more times
& Begin repeat separator

The alternative indicator | separates alternative patterns. If an alternative does not match, matching continues with the next alternative. If no alternatives match, the match fails. The value of any pattern variable inside a non-matched alternative is false. The scope of | is limited by the innermost enclosing brackets of any of the three types, or is the whole pattern if there are no enclosing brackets.

The grouping parentheses ( ) serve only to delimit the scope of |.

If a pattern inside the optional brackets [ ] does not match, matching continues with the pattern after the ] and the value of any pattern variable inside the [ ] is false.

The repeat brackets { } must be immediately followed by either * or + to indicate the minimum number of repetitions of the pattern inside the brackets. The value of any pattern variable inside the brackets is a sequence with one member for each repetition. Nested repeat brackets create nested sequences.

If the pattern inside repeat brackets { } contains the separator indicator & the portion of the pattern before & is matched on every repeat and the portion after & is matched on all repeats but the last. Thus it matches the separator between repetitions. If any pattern variable appears after &, its value will be a sequence one shorter, or false if there were zero repetitions.

Syntactic Constructs

The following are the built-in syntactic constructs. For a construct name, the compiler module exports parse_name which is the parsing function for the construct.

Note that the name of a syntactic construct cannot contain _ because of the special treatment of _ in pattern variable names.

Construct Explanation Parsed Result
name a name that is not punctuation and not defined as an operator, or \ followed by any name or a string the name
anyname a name, or \ followed by a name or a string the name
integer a literal integer the integer
string a literal string the string
flag matches nothing, parses as true true
expression an expression - note special handling of operator precedence expression
body one or more statements; definitions are scoped to the body expression
destructuring a name or more complex destructuring expression
formalparameters a formal parameter list formal_parameters
actualparameters an actual parameter list list[expression]
methodhead everything about a method except the body method_head
methodmodifiers modifiers inserted before formal parameters method_modifiers
pattern a pattern pattern
end delimiting newline or close bracket true

For example, the anyname and name syntactic constructs could have been defined by

def parse_anyname(lexer, indentation, scope, required?)
  if match?(lexer, #\\)
    parse_denatured_name(lexer)
  else if next(lexer) in name
    next!(lexer)
  else if required?
    wrong_token_error(lexer, "a name, or backslash followed by a name or a string")

def parse_name(lexer, indentation, scope, required?)
  if match?(lexer, #\\)
    parse_denatured_name(lexer)
  else if (def token = next(lexer)) in name and
          not (punctuation?(token) or
               known_definition(scope, token) in operator)
    next!(lexer)
  else if required?
    wrong_token_error(lexer, "a name that is not punctuation, an operator, or a macro," +
                             " or backslash followed by a name or a string")

def parse_denatured_name(lexer)
  def token = next(lexer)
  if token in name
    next!(lexer)
  else if token in string
    name(next!(lexer))
  else
    wrong_token_error(lexer, "a name or a string (after a backslash)")


Previous page   Table of Contents   Next page



Creative Commons License
Lunar by David A. Moon is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Please inform me if you find this useful, or use any of the ideas embedded in it.
Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.