Programming Language for Old Timers

by David A. Moon
February 2006 .. September 2008

Comments and criticisms to dave underscore moon atsign alum dot mit dot edu.

Previous page Table of Contents Next page

Data Model

Classes:

Every piece of data, without exception, is an instance of a class. There are no magic "primitive" data values that work differently from everything else.

Memory management is dynamic and safe. Only the garbage collector can deallocate memory. It is impossible to make a "wild" memory reference at run time. (There are special intrinsics defined in a special module which allow writing unsafe code, used only to write some of the runtime.)

Typing is dynamic. That is, objects have types. The type of an object is simply its class. An object is a member of its class and also is a member of each type from which its class inherits.

Definitions, slots, and method return values have types too, but these are just a restriction on what objects can be the value of a definition, the value of a slot, or the result of a method. These declared types are more general than object types: they can be any type of type: a class, a protocol, an integer range, or a union of declared types.

The false value has its own class (a subclass of boolean). This allows for the very common pattern of a declared type that is either a specific class or false. The true value has its own class too.

A class has exactly one superclass, except for the root class named anything which has no superclass. Multiple inheritance is ruled out primarily because it would create problems in determining whether two classes are disjoint, and secondarily because it would make compilation of slot access more difficult. But if you need multiple inheritance, see protocols below.

By convention the name of a class prefixed by $ is defined to the class object.

Constructors:

A class has exactly one constructor, unless it is an abstract class, which has no constructor. Calling the constructor is the only way to create an instance of a class. A class's constructor is a method for a function which by convention has the same name as the class (without the $ prefix).

Sometimes the actual constructor is internal and is given a different name, and an ordinary method for a function with the name of the class appears to be the constructor but is implemented by calling the real constructor. This provides increased convenience in using a class that can be constructed different ways. The ordinary method is called a pseudo-constructor.

Defining an abstract class does not define a constructor, but it is often useful to define a method for a function with the same name which will construct an instance of a subclass.

Slots:

A class has one or more slots. A slot can be real or virtual, single-valued or multi-valued, and fixed or assignable. A real slot has storage in the object, a virtual slot just has read and (if assignable) write expressions but no storage. You access a slot via a special "dot" syntax. Each slot has a declared type; all values of the slot must be members of that type.

Thus the real slots of a class define the structure of instances of that class. The virtual slots are just a way to allow the dot syntax to be used as syntactic sugar.

You access an individual value of a multi-valued slot by using a subscript in the "dot" notation. The number of values of a multi-valued real slot in an instance of a class is fixed at the time the instance is created.

You cannot define a slot with the same name as an inherited slot. If you need to override inherited characteristics, use a function with different methods applicable to each class.

There is only one space of slot names no matter what module you are in.

Every object has a fixed, real, single-valued slot named class whose value is the actual class of the object. This is declared in the root class anything. Like any slot you cannot hide it.

Protocols:

A protocol is a type that consists of nothing but a name and an associated set of method requirements. The semantics of a protocol type come from the methods and method requirements that have parameters of that type.

A method requirement is a requirement that an applicable method always exists for a specified function when applied to arguments of specified types. There could be one general method that works for all members of a specified type, or there could be specific methods for each non-abstract class that inherits from a protocol.

A protocol defines behavior of the objects that are members of the protocol, without regard for the structure of the objects. A class defines the structure of the objects that are members of the class, and also can define behavior when there are methods that are applicable to instances of the class.

A class inherits from one other class (except for anything, which inherits from nothing) and from zero or more protocols. A protocol inherits from zero or more other protocols. New protocols can be added to a class at runtime.

An object is an instance of a protocol if the object's class inherits from the protocol.

When there are multiple applicable methods in a function call, and multiple inheritance of protocols is involved, use the "Gabriel 4A" algorithm to resolve the order of methods. This is a better algorithm than the one that CLOS uses. [---TBD: explain algorithm later]

Anything type:

Every object is a member of the type anything. Every type is a subtype of anything.

Something type:

The type something has no members. It is only useful in connection with method requirements. Something is a subtype of every type. It is actually the degenerate case of a type union.

Null Pointers:

There is no "null pointer". However, "instance of class x or false" is a frequently used type that allows false to be used like the traditional null as an indication that no instance of class x is present. This is simply a type union of x and the false type. Note that false is different from the empty list.

Uninitialized Values:

There is a special "uninitialized" value which is not a member of any type. It is only used as the value of a definition or slot that has not yet been initialized. Reading the "uninitialized" value signals an error. A definition or slot cannot be reset to this value, so we only need to check for this value when reading a slot that is not initialized in the class definition and when reading a definition that could be read before its initial value expression has been executed. This "uninitialized" value takes on some of the roles of the null pointer in some other languages, but cannot be mentioned in PLOT code (except in special unsafe runtime code).

Object Representation:

The fact that every piece of data is an instance of a class implies that pointers do not need to be tagged if all classes are represented the same way. But an implementation might choose to have a special representation for small integers to decrease garbage collection overhead, either using reserved address space or using tags.

Some classes are "unboxable," which means the compiler knows how to represent the value of such a class directly, rather than as an object. This is purely a representation choice; all the object-oriented operations continue to work on unboxable objects. Instances of unboxable classes can be freely copied and thus lose their identity, therefore such classes are immutable.

A real slot whose type is unboxable is stored in unboxed form rather than as a reference to a boxed object. Many intrinsic methods are compiled inline when the arguments are unboxed. I'd like to make unboxability extensible but probably it is built into the compiler. The garbage collector will need to be told about unboxed slots, registers, and stack items.

The class integer is an abstract class with at least two concrete subclasses. The subclass for 32-bit integers is unboxable, perhaps there is also an unboxable subclass for 64-bit integers. The subclass for arbitrarily large integers is not unboxable. The compiler maps integer ranges to subclasses of integer that can contain them.

Previous page Table of Contents Next page