Parsing can be directed by a pattern which gets translated into the necessary imperative code to do the parsing. Using a pattern can be much more concise.
A pattern consists of pattern-constants, -variables, and -punctuation. Newlines, indentation, and whitespace are insignificant in patterns.
A pattern constant matches a single token in the input. There are four kinds of pattern constant:
A string literal "name" matches any name whose spelling is name.
A token pair $name matches any name which has the same definition as the visible definition of name in the scope where the pattern appears. If name is a name_in_context its context is used.
A string literal "name:" matches any keyword whose name's spelling is name.
The names ^, ^^, and ^= match a newline. The difference between these three pattern constants is how they handle indentation:
^ requires indentation greater than the indentation coming into the pattern. Every ^ in a pattern requires exactly the same indentation, which is determined by the first ^ to match.
^^ requires indentation greater than the indentation coming into the pattern but allows a different indentation for each match.
^= requires indentation equal to the indentation coming into the pattern.
A pattern variable matches a syntactic construct in the input and locally defines a constant whose name is the name of the pattern variable and whose value is the result of parsing that syntactic construct. This definition is visible in the body associated with the pattern. (But if a pattern variable appears inside { } the value is a sequence of parse results.)
It is an error for the same pattern variable name to appear more than once in a pattern.
A pattern variable is any alphanumeric name. It consists of the name of the syntactic construct, optionally preceded by any characters ending in _, and optionally followed by any number of digit characters. Thus body, body2, body15, and main_body are all pattern variables that match a body. See Syntactic Constructs for a list of syntactic constructs.
Pattern punctuation defines the structure of a pattern. The following pattern punctuation tokens are available:
$ | Same-definition pattern constant |
| | Alternatives |
( ) | Grouping |
[ ] | Optional |
{ } | Repeat |
+ | Repeat one or more times |
* | Repeat zero or more times |
& | Begin repeat separator |
The alternative indicator | separates alternative patterns. If an alternative does not match, matching continues with the next alternative. If no alternatives match, the match fails. The value of any pattern variable inside a non-matched alternative is false. The scope of | is limited by the innermost enclosing brackets of any of the three types, or is the whole pattern if there are no enclosing brackets.
The grouping parentheses ( ) serve only to delimit the scope of |.
If a pattern inside the optional brackets [ ] does not match, matching continues with the pattern after the ] and the value of any pattern variable inside the [ ] is false.
The repeat brackets { } must be immediately followed by either * or + to indicate the minimum number of repetitions of the pattern inside the brackets. The value of any pattern variable inside the brackets is a sequence with one member for each repetition. Nested repeat brackets create nested sequences.
If the pattern inside repeat brackets { } contains the separator indicator & the portion of the pattern before & is matched on every repeat and the portion after & is matched on all repeats but the last. Thus it matches the separator between repetitions. If any pattern variable appears after &, its value will be a sequence one shorter, or false if there were zero repetitions.
The following are the built-in syntactic constructs. For a construct name, the compiler module exports parse_name which is the parsing function for the construct.
Note that the name of a syntactic construct cannot contain _ because of the special treatment of _ in pattern variable names.
Construct | Explanation | Parsed Result |
---|---|---|
name | a name that is not punctuation and not defined as an operator, or \ followed by any name or a string | the name |
anyname | a name, or \ followed by a name or a string | the name |
integer | a literal integer | the integer |
string | a literal string | the string |
flag | matches nothing, parses as true | true |
expression | an expression - note special handling of operator precedence | expression |
body | one or more statements; definitions are scoped to the body | expression |
destructuring | a name or more complex destructuring | expression |
formalparameters | a formal parameter list | formal_parameters |
actualparameters | an actual parameter list | list[expression] |
methodhead | everything about a method except the body | method_head |
methodmodifiers | modifiers inserted before formal parameters | method_modifiers |
pattern | a pattern | pattern |
end | delimiting newline or close bracket | true |
For example, the anyname and name syntactic constructs could have been defined by
def parse_anyname(lexer, indentation, scope, required?) if match?(lexer, #\\) parse_denatured_name(lexer) else if next(lexer) in name next!(lexer) else if required? wrong_token_error(lexer, "a name, or backslash followed by a name or a string") def parse_name(lexer, indentation, scope, required?) if match?(lexer, #\\) parse_denatured_name(lexer) else if (def token = next(lexer)) in name and not (punctuation?(token) or known_definition(scope, token) in operator) next!(lexer) else if required? wrong_token_error(lexer, "a name that is not punctuation, an operator, or a macro," + " or backslash followed by a name or a string") def parse_denatured_name(lexer) def token = next(lexer) if token in name next!(lexer) else if token in string name(next!(lexer)) else wrong_token_error(lexer, "a name or a string (after a backslash)")
Previous page Table of Contents Next page