Programming Language for Old Timers


by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.


Previous page   Table of Contents   Next page


Lexical Syntax

At present the lexical syntax is hard-wired but in the future I hope to be able to define it in the language.

The lexical syntax consists of white space, comments, and tokens.

Literals:

A self-quoting literal datum is a number, a character literal, or a string literal. The lexical syntax of numbers, characters (using '), and strings (using ") is quite conventional. \ inside a character or string literal inserts the next character if it is a quote or backslash, inserts a control character if the next character is a, e, f, n, t, or r, inserts a character code specified in octal if the next character is 0, inserts a character code specified in hexadecimal if the next character is x, inserts a Unicode character code specified in hexadecimal if the next character is u, and otherwise is just an ordinary character.

The token object for a self-quoting literal datum is the number, character, or string itself.

Delimited Names:

A delimited name is a maximal sequence of one or more of the following characters that is not a number, is not a keyword, and does not start like prefix punctuation.

        letters digits ~ ! @ # $ % ^ & * _ - + = | : < > / ?

The @ character is only an ordinary character as the first character in a name. An @ elsewhere in a name indicates a name-in-module, explained later, in the modules section.

The ? and # characters cannot be the first character in a name. See prefix punctuation below.

The : character cannot be the last character in a delimited name unless all characters in the name are colons. See keywords below.

The token object for a delimited name is a simple-name, except for a name-in-module which is represented by a name-in-context object.

Punctuation:

Punctuation is a self-delimiting name. There are two kinds of punctuation:

( ) [ ] { } \ ` , . .. ... standalone punctuation: a self-delimiting token, with no need for delimiters before or after. The standalone punctuation marks are brackets of all kinds, backslash, backquote, comma, period, and multiple periods. Any number of periods in a row is one standalone punctuation token, but I have only shown up to three in this table.
? ?= ?: ?? # prefix punctuation: only recognized at the start of a token, thus it requires a delimiter before it, but does not require any delimiter afterwards. ? and ?= are used in patterns and templates. ?: and ?? are used in patterns. # is used to quote constant data that cannot be represented by self-quoting literals.

The token object for punctuation is a simple-name.

Keywords:

A keyword is a name-like token that ends in a colon (:) and contains at least one character that is not a colon. The token object for a keyword is a keyword object whose name slot contains the simple-name whose spelling is the spelling of the keyword minus the trailing colon.

Newline:

A newline is a character (or CR-LF character sequence) that advances to a new line, plus any comments, additional newlines, and whitespace preceding the next token. The indentation of a newline is the amount of whitespace between the last newline character and the next token.

The token object for a newline is a newline object whose indentation slot contains the number of spaces preceding the next token. For this purpose a tab counts as enough spaces to reach the next multiple of 8.

A token stream must not return a newline without a real token after it. Thus there cannot be two newline tokens in a row, nor a newline token at the end of the stream.

Alphabetic case is insignificant in names, keywords, and numbers.

The lexical syntax does not provide any way to "quote" special characters in a name or a keyword. However, the program syntax allows \ to turn a string into a name.

Because unary operators such as - are not prefix punctuation, whitespace or another delimiter is required between a unary operator and its operand, just as infix operators (other than "(", "[", and ".") must be delimited from their operands. -x and x+1 are names, not operator invocations.

The parser keeps track of source locations. The source location of a token is the file name and line number where the token appears.


Previous page   Table of Contents   Next page