Formal Language as a Medium for Technical Education

                        Ed Lowry  
                        7 Alder Way
                        Bedford Mass 01730
                           781 276-4098
                           eslowry@alum.mit.edu
                           users.rcn.com/eslowry

                                        Revised December 1, 1999
Abstract
Unspecialized formal language can be used to express technical knowledge in a way that is comprehensive, precise, and readable. With traditional notations, it is only practical to express part of the potentially precise information in a precise way. Fixing that problem can relieve a burden of distilling precise information from various informal representations and integrating it. As representation methods approach maximum expressive simplicity, there is a phase change (analogous to freezing) which forces a uniform structure on the units of information which model the subject matter of the language. That uniformity is independent of the subject matter and makes a "universal language supporting technical literacy" technically possible.

Textbooks for technical subject matter express their content using precise mathematical formulas along with other kinds of less formal representation including natural language statements, metaphors, diagrams, and examples. It is a thesis of this paper that large amounts of potentially precise information are not expressed precisely but diffused in less formal representations because of inadequacies in traditional mathematical notations. That creates a large avoidable burden for the learner of distilling the precise information out of the informal expressions and integrating it mentally without benefit of effective symbolic tools. The main thesis of this paper is that the burden is large and can be substantially eliminated by using optimum information components in a representation method that applies to any technical subject matter.

Teachers of bricklaying need to understand the shape of bricks and how they fit together. Classroom teaching is largely devoted to showing students how to arrange pieces of information so educators need a sound answer to the question:

What is a reasonable structure for basic pieces of information and how do they fit together?

Traditional representations of information tend to be built from components which are rarely subject to critical examination. There tends to be an assumption that the best choice for representations is highly dependent on the subject matter and that improvements are unlikely to be large or have wide scope of applicability. The best available engineering analysis [1] of information structures at the level of their most basic components presents a very differ view.

A basic finding of the analysis is that pressing the limits of expressive simplicity in formal language for non-trivial subject matter forces uniformity on the structure of the component data objects used to represent the subject matter of the language. The orderliness which appears when needless complexity is removed is analogous to crystallization that develops in a liquid as heat is removed. The properties of formal language change significantly for the better, across this "phase change". For maximum expressive simplicity all data objects will have a structure called "needles" by analogy with pine needles which are pointed and organized in tree structures. The more important and simpler part of the analysis concludes that all data objects will have the same well defined structure. There is nothing foggy about the structure of well chosen data objects even though a foggy understanding of them may be widespread.

An offer of $20,000 for justification of alternative data object structures was posted on the above website and fairly widely advertised in February 1999. The response has tended to confirm that the information technology community has been deeply unaware of data object structure as a significant issue. This contrasts sharply with other technologies where basic structures have almost always received meticulous attention. The crystallization appears to have gone unnoticed because there has been little serious effort to press the limits of expressive simplicity. The resulting lack of appropriate expertise has been impeding assessment of the urgency for educators to be able to answer the above question clearly. A basic goal in writing this paper (and of the above offer) is to encourage anyone concerned about technical education to ask that question until they are satisfied that both they and the education community have an adequate answer.

Additional reasons for trying to understand the fine structure of information include:

Educational technology based on improvements in representation can be much less costly than improvements based on hardware, and may be much more effective. Focusing more on representational issues rather than hardware could do much to reduce the "digital divide".

NEEDLE DATA OBJECTS, AN IRREDUCIBILITY OPTIMUM

Needle data objects make it possible to combine many of the advantages of formal language and natural language. The result can be used to improve communication in a wide range of technical subjects. Such language can be enduring because maximum simplicity of expression is being approached. There will be very little further simplification in a statement such as:

82 = count element where some isotope of it is stable

The language would usually be written or read. It would only be spoken in brief phrases.

Needles allow near maximum simplicity of description for any moderately rich technical subject matter. They interconnect with each other and are organized in hierarchies. Each needle points from its "parent" in the hierarchy to another object, possibly remote in the hierarchy or perhaps itself. A needle also has connections to immediate neighbors in the hierarchy, a possible next "sibling", and a possible first "child".

.

.

.

.


4 needles representing
an age relationship

                  Needles representing an age relationship

There are a few dozen cases where progressively optimizing a simple engineering design terminates with a sharply defined structural constraint rather than by design choice in a tradeoff. In each case the structural constraint eliminates a class of design deficiencies. For example, constraining wheels to be round eliminates vertical vibration, constraining pillars to be vertical eliminates shear forces, constraining mirrors to be flat eliminates image distortion, constraining the top and bottom of building blocks to be horizontal eliminates sliding forces. Such constraints often have no adverse side-effects which raise significant tradeoff issues. Most of these "irreducibility optima" [1] get broad and enduring acceptance. They tend to have large social and economic value. The needle data object appears to fit the pattern.

In most of them, even small deviations from the optimum structure are usually unacceptable when the engineering requirements are demanding. It is all right for a tent pole to slope away from vertical, but a pillar holding up a heavy building deviates little. Similarly, simple information can be safely represented in a variety of ways, but rich information structures are best represented using only needles. The possibility that current representations are unreasonable in the same sense that square wheels or doughnut shaped bricks are unreasonable adds to the urgency of getting a good answer to the above question.

LANGUAGE GENERALITY

The optimum choice of data primitives can reduce the need for specialization in technical language semantics. Since all functions operate on data structures using a common primitive object, they can be easily merged into a common language semantics regardless of what subject matter is involved. While special purpose language features are needed, they can be much more easily expressed as superficial extensions to the more general purpose language.

Historically, one limitation on formal language generality has been a dichotomy between:

 
    - STRUCTURALLY EXPRESSIVE languages which have rich data 
        structures but not powerful expression. (Ada, PL/1, etc  
        which use MANY primitive kinds of data object) 
  and 
     - FUNCTIONALLY EXPRESSIVE languages which allow many
        operations on large data aggregates in a single
        expression. (APL, Relational DB, etc, which use very 
        FEW kinds of primitive) 

Use of just one kind of primitive, the needle, allows for functional expressiveness. Needles also allow a rich structural expressiveness. Limitations of either structural or functional expressiveness of the data structures reduce simplicity of expression and limits generality.

Unspecialized language reduces barriers to accessing unfamiliar technical knowledge by reducing need for preliminary language learning. Parts of the language could be learned in primary school and then used to support subsequent technical learning.

Needles can provide a natural standard for the data objects in technical language. After the underlying data structure issues are decided, a semantics of simple expressions for referencing and manipulating data substructures follows fairly naturally. Additional standardization could lead a durable foundation for a universal language supporting technical literacy. Universality of capability, of course, is no assurance of universal acceptance.

Complex problems increasingly transcend disciplinary boundaries and integration of the concepts can be facilitated by use of a common unspecialized language.

SHANNON OPERATORS

A stable set of functions used in programming languages includes:

     arithmetic including comparisons
     boolean operations
     set operations
     matrix operations.

They have broad application and stable definitions. The optimality of needles suggests that an expanded and more integrated set of functions can be defined which can have similar breadth of application and durability.

From formal languages which are at least somewhat unspecialized, it is possible to select groups of functions which operate on simple structures and which do not embody real world knowledge. They can form a language kernel which could be referred to as the "Shannon operators".

Such groups include:

So far, attempts to produce general purpose languages have not succeeded in incorporating all of these in a satisfactory way. The simplicities gained by using needles make it practical to do so. Almost all have been incorporated into KEEP, a predecessor of Shannon.

The stability and integration could make the operators a common tool for technical communication, both with people and machines.

Learning of a basic core language for computer usage and system description can probably be made a once in a lifetime effort. Simplicity of expression, simplicity of language, and generality of language can be critical in justifying introduction of technology. Historically educators have carefully controlled the complexity of learning environments, but current trends in educational technology endanger that control.

Type concepts are useful for providing diagnostics and abbreviations in the use of such functions. It is proposed that underlying definitions without type sensitivity be defined on the ground that those definitions would be more stable.

ROLES FOR THE LANGUAGE

Computer hardware and software can enhance the usefulness of the language but computer assistance is NOT initially a requirement. Such language can serve to assist students in a variety of their basic needs:

The language could assist educators in:

Needles allow a declarative natural language style which can help build on previous learning. The total explicitness provides confidence that mysteries can be resolved. Automated analysis tools can speed the resolution. This can be particularly valuable when teachers are not readily available.

The need for metaphor and other informal expressions arises mostly during early stages of learning when the student may be more disoriented. Later, for proficiency at a detailed level, there is a greater need for tightly integrated comprehensive, precise, statements whose unambiguous interpretation can be depended on. Metaphors introduce foreign baggage which can obscure the picture later on. At any stage of learning the student (especially mature students) may have questions (not necessarily articulated) where less than precise answers are not satisfactory.

The optimum data object structure provides assurance that the underlying semantics of the core language can remain stable. Variations in superficial syntax and other extensions may be expected.

The earliest student exposure to the language could take the form of using it to manipulate toy environments using computers. Later it would be used to communicate well established mathematical and scientific ideas to the student. Learning to read the language is easier than learning to write it, so it can be used to explain before proficiency in writing it is developed. At a later stage the emphasis would shift to developing, and testing models.

The structure of an environment tends to be more clearly expressed in the language than the procedural details of data manipulation within it. Algebraic problems may illustrate these effects. The form of algebraic expressions can be described fairly clearly in the language. The procedures are less clearly represented (at least at present) and are probably useful mainly for supporting clarifications of more intuitive descriptions.

For complex subject matter where coherence of discussion is difficult to maintain, the language can be used to describe the underlying structures of the subject matter. Doing so helps assure that there is a coherent subject of discussion and it can provide a framework on which to interpret informal discussion of the subject.

Most of the results of education are in the student's head, but they also include a body of notebooks and personal library which the student has learned to access easily and with confidence. Expressing such knowledge in the simplest way will enhance its adaptability to different learning and work environments.

Computer analysis of knowledge prerequisites implied in the declarations could help people get oriented in unfamiliar subject matter. They could then solve problems successfully in technical areas for which their background is limited by selecting only the information which is relevant to their immediate needs. Such analysis could also help orient students when they fall behind. Reducing the need for depth of commitment to specialized learning can encourage a wider variety of intellectual exploration by students.

There are hints that unspecialized formal language can enhance creativity. An effort to translate physics into such language resulted in finding a clear picture [2] of electromagnetic fields hidden in Maxwell's equations. Clarification of the structure of data objects and their effect on simplicity of expression resulted from efforts to describe such language in itself.

REFERENCES

[1] E. S. Lowry, Toward Perfect Information Microstructures at www.ultranet.com/~eslowry.

[2] E. S. Lowry, Physical Rev. pg 616, 1960, and Am. J. of Physics pg 871, 1963 For a brief description see The Electromagnetic Field in Space-time

[3] E. S. Lowry, Proc. of   ED-Media96, AACE, June 1996, pg407.(a preliminary version of this paper)

EXAMPLES

The following give descriptions of some initial content from high school chemistry, accounting and particle physics. In each case substantial amounts of precise information are presented in a precise way that was only informally expressed in the original source material. Such descriptions need not and often cannot be executed by computer.


                   Elementary CHEMISTRY in SHANNON 
 
[[declare chemistry domain  
     
declare element list 
  has id(hydrogen, helium, lithium, ... ) 
  has atomic_weight in number 
  has atomic_number in tally 
     
declare atom set 
  has element 
declare mass quantities      
declare volume quantities 
declare temperature quantities 
     
declare molecule set 
  has compound 
  has set atom 
     
declare compound set 
  has id(carbon_dioxide, water, molecular_oxygen, ozone, ...) 
  holds set component 
  has set portion converse 
  has molecular_weight in number :=sum for its component take 
                        its tally * atomic_weight of its element 
declare component sets 
  has element key 
  has compound converse 
  has tally 
  has fraction in number  
      := its tally * atomic_weight of its element / molecular_weight     
                                                    of its compound 
declare portion set 
  mayhave compound 
  has state_of_matter 
  has mass 
  has molecule_count in tally  
  mayhave temperature 
  mayhave volume                                   
     
declare state_of_matter set 
  has id(solid, liquid, gas) 
     
declare transformation set 
  has set input in portion 
  has set output in portion 
  maybe decomposition := count(its input)=1 and count(its output)>1 
     
    /* gas law 
certify some number satisfies every portion where gas satisfies 
  its pressure * its volume / its temperate = the number
certify decomposition where compound of its input is sulphur_dioxide 
  satisfies mass of its sulphur output = mass of its oxygen output ]]
 
 
 
                        ACCOUNTING 
 
This summarizes some basic accounting concepts in SHANNON. 
 
[[declare accounting domain 
 
declare business_entity set 
        has name in string key generic 
        holds set ledger in account 
 
declare account sets 
        has name in string key generic 
        mayhave business_entity 
        is_one_of (asset_acct, liability_acct, capital_acct) 
        is_one_of (curr_asset, fixed_asset) if(asset_acct) 
        is_one_of ( control_acct, subsidiary_ledger ) subtype 
        holds set subsidiary_ledger if(control_acct) 
        holds list acct_period 
 
declare quarter list 
        has ordinal key 
        holds list journal in transaction 
        has set acct_period  
 
declare acct_period lists 
        has quarter key 
        has account converse 
        has list entry_line := entry_line of transaction of its quarter  
          where account of the acct_period = account of the entry_line 
        has list debit in entry_line := its entry_line where dr  
        has list credit in entry_line := its entry_line where cr 
        has balance in dollar :=  
                        sum (value of its debit) 
                      - sum (value of its credit) 
 
declare transaction lists 
        has ordinal key 
        has date 
        has quarter  
        has event in string 
        maybe adjusting 
        maybe closing 
        holds set entry_line which 
                (is_one_of ( dr, cr ) subtype 
                has value in number 
                has account) 
]] 
 
 
 
 
         PARTICLE PHYSICS in SHANNON 
 
[[ declare particle_physics domain; 
 
declare materiality set 
    has id(matter, anti_matter); 
 
declare color set 
    has id(lepton, red, green, blue); 
 
declare tronity set 
    has id(tron, trino) 
    has set flavor converse; 
 
declare generation set 
    has id(first_generation, second_generation, third_generation) 
    has set flavor converse; 
 
declare flavor set 
    has id( down, up, strange, charm, bottom, top) 
    has generation key := first_generation if down or up else 
                      second_generation if strange or charm else 
                      third_generation 
    has tronity key := tron if down or strange or bottom else     
                                       trino; 
 
declare handedness set 
    has id(left, right); 
 
declare mass quantities; 
 
declare charge values additive 
    has electric in(for integer take it/3) 
    mayhave weak in(for integer take it/2) 
    has r_g in(for integer take it/2) 
    has g_b in(for integer take it/2) 
    has b_r in(for integer take it/2) := -r_g-g_b; 
 
 
 
 
                        /*  particle physics, continued 
declare particle set 
    has generation 
    has tronity 
    has color 
    has materiality 
    has mass 
    has flavor := flavor(its generation, its tronity) 
    has handedness 
    isoneof(neutrino, tau, muon, electron, quark) 
              :=     quark if not lepton 
                else neutrino if trino 
                else tau if bottom 
                else muon if strange 
                else electron 
    has charge := create charge with ( 
                    electric: (     0 if neutrino 
                               else -1 if lepton 
                               else 2/3 if trino 
                               else -1/3 if tron ) 
                                  *(1 if matter else -1), 
                    weak: 0 if left and anti_matter  
                                  or right and matter 
                        else 1/2 if tron xor antimatter 
                        else -1/2, 
                    r_g: (    0 if lepton or blue 
                          else 1/2 if red else -1/2) 
                             *(1 if matter else -1), 
                    g_b: (    0 if lepton or red 
                          else 1/2 if green else -1/2) 
                             *(1 if matter else -1)     ); 
 
]] 

Last updated 13 Jan 2000 by Ed Lowry. Click to send mail, or for home page.    ()