Programming Language for Old Timers


by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.


Previous page   Table of Contents   Next page


Source Code Model

This section describes the abstract syntax tree form of source code, which is called P-Expressions. This is the form that results from parsing and is manipulated by macros. The textual form of source code, or surface syntax, is described later.

Source code is made out of P-expressions. A P-expression, often abbreviated to just expression, is the parsed form of a PLOT construct.

Names in P-expressions have not yet had their binding to a lexically visible definition resolved. This allows fragments of a complete source code model to be manipulated. However, it requires a scope to be supplied in order to fully understand an expression.

The following kinds of expressions exist in PLOT, i.e. are known to the compiler:

literal self-quoting literal datum (number, character, or string)
quotation literal constant
name reference to a definition
invocation application of a function to arguments
special expression something executed in a special way

There is a separate class of special expression for each "special form" intrinsic to the compiler. The special expression classes and more detail on the object-oriented representation of source code are explained at the end of this section.

The textual form of expressions is greatly expanded beyond the text directly corresponding to this semantic model through the use of macros, which are explained later.

A definition is an association of a name and an object. A definition is either fixed, meaning it always associates to the same object throughout its lifetime, or assignable, meaning the association can be changed to a different object through assignment. Most definitions are fixed; it takes a special construct to create an assignable definition. A fixed definition can be constant, which means that the associated object is known at compile time.

A definition is created by executing the expansion of one of several macros whose names by convention start with def, or by binding a parameter to an argument when invoking a method, or by binding a pattern variable to an item that was parsed when executing a pattern. Parameters and pattern variables are always fixed definitions.

To understand definitions it is necessary to understand the concept of blocks. A block is a sequence of one or more expressions that constitutes a scope for definitions. Blocks can be created explicitly using the block macro, and also are implied in certain places: a method is enclosed in a block that includes its parameter list and its body. The iteration statements for, while, and until enclose the entire statement in a block.

Blocks can nest. An expression in an inner block is not a member of the outer block.

A definition is either a local definition or a global definition. A global definition is created by a definition that is outside of any block. All other definitions are local.

The scope of a local definition is the block where it is defined and all nested blocks that do not hide that definition with another definition with the same name. The scope of a global definition is all code outside of blocks that is in the same module or in another module that imports the definition and all nested blocks that do not hide that definition with another definition with the same name. In addition, a global definition can be accessed from outside its scope by using the name-in-module construct (the @ sign) which is explained later.

It is an error to define two definitions of the same name as members of the same block.

The concept of "the same name" is a bit tricky where macros are involved and the full details are explained later.

Note that the scope of a definition includes expressions that appear in its scope before the definition itself. In other words, the scope of a definition is the entire containing block, not just the rest of the containing block. Thus no special construct is needed to define recursive or mutually recursive functions and mutually referencing classes. However this necessitates two additional concepts:

A definition is accessible in each expression after the one that defines it, and also in methods that appear before the defining expression but are not called until afterwards. Sometimes a definition is only conditionally executed; in this case the definition is only accessible if it was actually executed on the path to the expression where it is used. The definition itself is created unconditionally, but the value of the definition is conditional. It is an error to access a definition where it is in scope but not accessible.

A definition is known in each expression after the one that defines it, provided the definition is constant. Operators and macros, and any definitions used by the bodies of macros or by methods that macro bodies call must be known where the operator or macro is referenced. A module referenced by the name-in-module construct must be a known definition. The known definition concept makes compilation possible.

Referencing a definition where it is in scope but not known, when the value of the definition is an operator or a macro and the reference is such that if the definition had been known the reference would have parsed differently, is an error. This error can be detected if the compiler keeps track of all names in each scope that have been unsuccessfully checked for known definitions and gives an error if a known definition to an operator or a macro is added to the scope later.

A similar error can occur if a known definition in an outer block is hidden by a definition that occurs later than the reference. The known definition in the outer block would have been used erroneously by the compiler. This too can be detected at the expense of keeping track of past known definition lookups.

To evaluate a sequence of expressions as a body without enclosing them in a block, use the do statement. Its body expressions become members of the same scope of which the do statement is a member. This is especially useful for macro expansions.

A module is a set of global definitions. A module is an object and can be given a name by definition, normally as a global definition in another module. This is only useful if the name of the module is a fixed definition and thus known. Some of the definitions in a module can be exported, and a module can augment its definitions by importing some or all of the exports of another module, possibly renaming them.

There is a module named PLOT that exports all the names defined in the standard language, including all the operators, macros, and standard functions. There is also a root module which contains definitions of names as other modules. Another important module is compiler, which exports among other things the names of the types and functions associated with the P-expression representation of source code. Many macros use definitions from the compiler module.

The expression that creates a local definition can be executed more than once, if it is inside a method or a loop. Each execution creates a new definition. The previous definition created by the previous execution is no longer in scope, unless any closures that reference it were created and still exist.

A name object in a P-expression is a reference to a definition, but unlike the case in Common Lisp there is no direct connection from the name object to the definition. The connection only exists in the context of a scope, which is either a module or a local scope comprising a block and its nested blocks. Hygienic macros aside, there is only one name object with a given spelling but there can be many definitions of that name in different scopes.

The object-oriented representation of source code is as follows. Note that all expression objects are immutable, at least in their public API.

module: @compiler@

;; The type expression includes all source code expressions
defprotocol expression

;; Self-quoting literal data are expressions
defprotocol literal is expression

add-protocol($number, $literal)
add-protocol($character, $literal)
add-protocol($string, $literal)

;; A name is an expression
add-protocol($name, $expression)

;; A quotation is a constant expression
defclass quotation(datum) is expression
  datum = datum

;; The source-locator protocol remembers where an expression came from,
;; through parsing, macro expansion, and all other forms of code rewriting
defprotocol source-locator
  source-location(exp is source-locator, result: file, line)
  ;; With no arguments, source-location returns the current source location,
  ;; which is dynamically bound by the parser
  source-location(result: file, line)

;; Base class for all expressions that can remember their source location
defclass compound-expression() is source-locator, expression
  source-file = source-location()
  source-line = block
                  def file, line = source-location()
                  line

def source-location(exp is compound-expression)
  values(exp.source-file, exp.source-line)

;; An invocation applies a function to arguments
defclass invocation(function is expression, rest: arguments is expression)
         is compound-expression
  function = function
  argument[length(arguments)] = arguments[i]

;; A spread-invocation is an invocation that contains one or more spread
;; arguments.  The spread-flags is a bit vector with 1 where spreading occurs.
defclass spread-invocation(spread-flags is integer, function is expression,
                           rest: arguments is expression)
         is invocation(function, arguments...)
  spread-flags = spread-flags

;; A conditional executes one of two expressions depending on the result of a test
defclass conditional(test is expression, consequent is expression, alternate is expression)
         is compound-expression
  test = test
  consequent = consequent
  alternate = alternate

;; An assignment changes the value of a definition
defclass assignment(lhs is name, rhs is expression)
         is compound-expression
  lhs = lhs
  rhs = rhs

;; slot-expression is the common superclass of slot-reference and slot-assignment
;; subscript is false if single-value, else evaluates to the numeric subscript
defclass slot-expression abstract: (object is expression, slot-name is simple-name,
                                    optional: subscript is expression or false)
         is compound-expression
  object = object
  slot-name = slot-name
  subscript = subscript

;; A slot-reference reads a slot
defclass slot-reference(object is expression, slot-name is simple-name,
                        optional: subscript is expression or false)
         is slot-expression(object, slot-name, subscript)

;; A slot-assignment writes a slot
defclass slot-assignment(object is expression, slot-name is simple-name,
                         rhs is expression,
                         optional: subscript is expression or false)
         is slot-expression(object, slot-name, subscript)
  rhs = rhs

;; A collation evaluates a sequence of expressions in order and
;; returns the values of the last
defclass collation(rest: body is expression) is expression
  body[length(body)] = body[i]

;; A collation-first is like a collation but the result is the
;; values of the first expression instead of the values of the last expr
defclass collation-first(rest: body is expression) is collation(body...)

;; A scopation is a collation that also defines a scope
defclass scopation(scope, rest: body is expression)
         is collation(body...)
  scope = scope

;; An exitation defines an exit-function whose scope is the evaluation
;; of the body.  Calling the exit-function immediately returns its arguments
;; as the values of the exitation and terminates evaluation of the body.
;; The name will be defined as an exit function in the scope.
defclass exitation(scope is scope, name is name, body is expression)
         is compound-expression
  scope = scope
  name = name
  body = body

;; A sanitation implements the cleanup: keyword in a body
defclass sanitation(body is expression, cleanup is expression)
         is compound-expression
  body = body
  cleanup = cleanup

The following classes represent definitions. They have double duty: as a P-Expression, a definition represents a source code expression that creates a definition, typically a def statement. In the scope and module data structures, a name is mapped to a definition that represents what that name is defined to be.

All definition constructors take a scope as their first argument and add the definition to the scope.

;; The abstract base class for all types of definition
;; When a definition object is created, it adds itself to the scope
;; specified as the first argument to the constructor
defclass definition abstract: is compound-expression

;; A constant-definition has a value known at compile time.
defclass constant-definition (scope, name, value) is definition
  name is name = name
  value is anything = value
  init: add-definition(scope, name, this)

;; A variable-definition does not have a value known at compile time,
;; but once set the value cannot be changed by assignment, unless
;; this is the assignable-definition subclass.
defclass variable-definition (scope, name, type-expression, value-expression)
         is definition
  name is name = name
  type-expression is expression = type-expression
  value-expression is expression = value-expression
  init: add-definition(scope, name, this)

def assignable?(defn is definition) false

;; An assignable-definition is like variable-definition but the value
;; can be changed by assignment.
defclass assignable-definition is variable-definition

def assignable?(defn is assignable-definition) true

;; A multiple-definition defines multiple names to multiple values
;; The number of types must equal the number of names
;; The types are expressions
defclass multiple-definition (scope, names, types, expression) is definition
  name[length(names)] is name = names[i]
  type[length(names)] is expression = types[i]
  expression is expression = expression
  init: map(add-definition(scope, _, this), names)

defclass assignable-multiple-definition is multiple-definition

def assignable?(defn is assignable-multiple-definition) true

;; A method-definition adds a method to a function
;; and defines the function if the name is not already defined
;; in the current scope.  The "method" is actually a functation
;; which will later be evaluated to get the run-time method.
;; This does not actually appear in a scope, it is only used as
;; a P-Expression.
defclass method-definition(scope is scope, name is name, method is functation)
         is definition
  name = name
  method = method
  init: install-method(scope, name, method)

;; A functation evaluates to a function containing one method
;; which is constructed from a name, parameter-list, and body.
;; A function has a collection of methods and also has a collection of functations
;; that represent methods that are being compiled but have not yet been loaded.
;; This lets the compiler know that such a method will exist at run time.
;; When a functation is evaluated, it is removed from the function's functations
;; and its value's method is added to the function's methods.
defclass functation(name is name, scope, parameter-list, body is expression)
         is compound-expression
  name = name
  scope = scope
  parameter-list = parameter-list
  body = body


Previous page   Table of Contents   Next page