A string is a constant sequence of characters. The length is the number of characters.
Strings implement the keyed sequence protocol. The keys are non-negative integers which increase monotonically but are not necessarily consecutive. The sequence and keyed sequence protocols use this key as the position. These string positions are not character indexes.
Strings implement the succession protocol, so a substring can be computed from the positions of the first and last characters.
The internal representation is UTF-8 in a multi-slot of 0..255 named utf8. A string position is actually a key of this keyed sequence of UTF-8 bytes. The primary constructor takes a sequence of bytes as its actual parameter. There are pseudo-constructor methods to construct a string from many object types. Note that unlike other sequence constructors, the string constructors take a single parameter, rather than taking each sequence member as a separate actual parameter.
To build up a string in a variable "string buffer", use stack[character] and when finished pass it to string to convert it to a string.
string could have been defined by:
sealed: defclass string(utf8 sequence[0..255]) \ constant_succession[character], reversible_sequence[character] utf8[utf8.length] = utf8 ;; Implement sequence protocol def iterate(s string) 0 ; initial string position def more?(s string, pos 0..max_length) pos < s.utf8.length def next(s string, pos 0..max_length-1) utf8_to_character(s.utf8, pos) def iterate(s string, pos 0..max_length-1) pos + utf8_character_length(s.utf8, pos) def (s string).length utf8_length(s.utf8) ;; Implement the reversible_sequence protocol, same position def reverse_iterate(s string) reverse_iterate(s, s.utf8.length) def reverse_more?(s string, pos -1..max_length-1) pos >= 0 def reverse_iterate(s string, pos 0..max_length-1) block exit: return for next_pos = pos - 1 then next_pos - 1 if next_pos < 0 or s.utf8[next_pos] < 0x80 then return(next_pos) ;; Implement keyed_sequence protocol def keyed_iterate(s string) 0 def next_key(s string, pos 0..max_length-1) pos def next_member(s string, pos 0..max_length-1) next(s, pos) def keyed_iterate(s string, pos 0..max_length-1) iterate(s, pos) def (s string)[key 0..max_length-1] utf8_to_character(s.utf8, key) def (s string)[key, named: default] if key in 0..max_length and key < s.end_position and s.utf8[key] < 0x80 utf8_to_character(s.utf8, key) else default ;; Implement succession protocol def (s string).end_position s.utf8.length def (s string)[r range[integer]] subsuccession(s, r) ;; Pseudo-constructors def string(x string) x ; already a string def string(x name) x.spelling def string(x character) string(character_to_utf8(x)) def string(x false) "false" def string(x true) "true" def string(x float) ; decimal floating-point representation def string(x integer, named: base = 10 2..36) ; representation in base 'base' ;; Convert a sequence of characters to a string def string(x sequence[character]) def buffer = stack[0..255]() for c in x append!(buffer, character_to_utf8(c)) string(buffer) ;; Every object can be converted to a string require string(x everything) => string ;; Default method could show class name and values of selected slots def string(x) def buffer = stack[character](class(x).name.spelling...) push!(buffer, '(') def slotcount := 0 for slotname => slot in class(x).slots while slotcount < 5 def value = _internal_slot_value(x, slot) if value in number | boolean | character | string | name if slotcount > 0 then append!(buffer, ", ") append!(buffer, string(slotname)) append!(buffer, ": ") append!(buffer, string(value)) slotcount := slotcount + 1 push!(buffer, ')') string(buffer)
The in operator with a string as the right-hand operand accepts as the left-hand operand either a character, which has the usual in sequence meaning, or a string, which tests whether the left-hand string is a substring of the right-hand string.
The position function is similar.
TBD
;; String equality def (s1 string) = (s2 string) s1.length = s2.length and for c1 in s1, c2 in s2 using always always c1 = c2 ;; String comparison def (s1 string) < (s2 string) block exit: return for c1 in s1, c2 in s2 if c1 < c2 then return(true) if c1 > c2 then return(false) ;; common prefix is equal, so shorter string is less return(c1.length < c2.length) ;; TBD > etc. downcase, upcase, alpha_char? digit_char?
The following functions are used in the implementation of strings:
;; The number of characters represented by a sequence of UTF-8 bytes def utf8_length(utf8 sequence[0..255]) for byte in sequence using count count byte < 0x80 ;; The number of UTF-8 bytes that constitute the next character def utf8_character_length(utf8 keyed_sequence[0..255], pos) utf8_character_length(utf8[pos]) ;; The number of UTF-8 bytes that constitute the character starting with this byte def utf8_character_length(byte 0..255) if byte < 0x80 then 1 else if byte < 0xE0 then 2 else if byte < 0xF0 then 3 else 4 ;; The number of UTF-8 bytes that constitute this character def utf8_character_length(c character) def code = c.code if code < 128 then 1 else if code < 2048 then 2 else if code < 65536 then 3 else 4 ;; A sequence of UTF-8 bytes that encode just one character def character_to_utf8(c character) character_utf8_sequence(character) constant: defclass character_utf8_sequence(char character) sequence[0..255] def (s character_utf8_sequence).length utf8_character_length(s.char) def iterate(s character_utf8_sequence) 0 def iterate(s character_utf8_sequence, pos) pos + 1 def more?(s character_utf8_sequence, pos) pos < utf8_character_length(s.char) def next(s character_utf8_sequence, pos) def code = s.char.code if code < 128 then code else if code < 2048 if pos = 0 then 0xC0 + code / 64 else 0x80 + code & 0x3F else if code < 65536 if pos = 0 then 0xE0 + code / 4096 else if pos = 1 then 0x80 + (code / 64) & 0x3F else 0x80 + code & 0x3F else if pos = 0 then 0xF0 + code / 262144 else if pos = 1 then 0x80 + (code / 4096) & 0x3F else if pos = 2 then 0x80 + (code / 64) & 0x3F else 0x80 + code & 0x3F ;; The next character from a sequence of UTF-8 bytes def utf8_to_character(utf8 keyed_sequence[0..255], pos) def byte = utf8[pos] character(if byte < 0x80 then byte else if byte < 0xE0 then (byte - 0xC0) * 64 + utf8[pos + 1] - 0x80 else if byte < 0xF0 then (byte - 0xE0) * 4096 + utf8[pos + 1] * 64 + utf8[pos + 2] - 0x2080 else (byte - 0xF0) * 262144 + utf8[pos + 1] * 4096 + utf8[pos + 2] * 64 + utf8[pos + 3] - 0x82080) ;; Make a string from its UTF-8 representation in zero or more bytes ;; This is defined as the constructor of the string class ;; def string(utf8 sequence[0..255])
The $ character in a string literal indicates an interpolation directive when it is not denatured by an immediately preceding \, it is not the last character in the string, and the immediately following character is alphabetic or one of _, ?, !, ¿, ¡, (, [, or {. Otherwise $ just represents itself.
String interpolation is an expression whose result is a string. Characters in the string literal that are not part of an interpolation directive are carried directly into the result.
The interpolation directives for string interpolation are as follows:
$name converts the value of name to a string and inserts it into the result. A character or string inserts itself. A number inserts its decimal value preceded by a minus sign if negative. A name inserts its spelling. A boolean inserts "true" or "false". A sequence inserts its members separated by commas. Anything else inserts the result of string applied to it.
$(expression) evaluates expression and inserts the result in that same way.
$(expression, parameters) evaluates expression and inserts the result with formatting controlled by the actual parameters. See Interpolation Parameters
$[expression] and $[expression, parameters] are the same as with parentheses, except if the result of expression is false nothing is inserted.
${substring1 & substring2} is an iterating interpolation. If substring1 contains at least one interpolation directive whose value is a sequence, this iterates as many times as the longest such sequence. On each iteration, it inserts substring1 but for each interpolation directive whose value is a sequence, it uses the next member of the sequence, or nothing if the sequence is exhausted. On each iteration but the last, it inserts substring2 after substring1, with the same treatment of any sequences being interpolated.
Otherwise this just processes substring1 in the ordinary way and ignores substring2.
The & substring2 part is optional and can be omitted. If this part is present, spaces that directly precede and/or follow the & will be ignored.
${ cannot be nested inside ${...}
See Templates for a similar feature for program code.
The formal parameter list that accepts the actual parameters in an interpolation directive is:
named: base = 10 2..36, separator = ", " sequence[character]
base is the radix for conversion of integers to strings.
separator is the string that separates members of a list.
TODO: Add more parameters such as min and max width and alignment
TODO: Examples
Previous page Table of Contents Next page