This post started as an email to @eikeon. Our conversations on language design are moving forward at a lumbering pace, and I want to make sure that some of them are set down, for two reasons: so we don’t forget where we are, and so later we can look back and see how far we’ve come.
Term
We have talked about terms as a centerpiece of our language. One reason for this is to craft a language that is oriented around humans, who generally seem to have an easy time with the concept of a term. But it is not particularly problematic to think of a term as an object or a function or even a closure if you’re wired that way, e.g. you are a being made partly of wires.
Alphabet
The term has a label, which is just a contiguous sequence of characters. We haven’t talked a lot about which alphabet we’ll use, but I’m guessing it will either be something Unicode-ish, or we will leave it open to the lexer. If we do the latter, the alphabet will be a set of letters from which terms may be spelled (e.g. {a, b, c, d, e, f}).
Lexicon
For any given program, there is a corresponding list of entries, which are the lexicon for the program. The entries are terms which must be spelled with whatever alphabet is used. No terms may be used in the program that are not in the lexicon. The list of terms contains a corresponding list of definitions- zero or more per term.
Entries
A an entry is either empty (the term may only be replaced with itself) or contains two parts: its shape and its definition. The shape of an entry can be thought of as its part of speech, or as a named graph with the term at the center. The definition, when it exists, consists of one or more sentences.
Part of Speech
The part of speech, or named graph, is part of a grammar. The grammar of a language describes which parts of speech may be employed in sentences. A part of speech “connects” to other parts of speech (or named graphs). This can be thought of as the local structure on an abstract syntax tree for those more inclined to think that way, e.g. robots.
Sentence
A sentence is a collection of terms that are evaluated concurrently. This is not to say that word order in a sentence is unimportant, as the first term encountered is the first one for which evaluation begins, and subsequent terms in the sentence may rely upon this fact.
Paragraphs, etc.
Sentences may be collected together into groups and called a paragraph. A paragraph, simply put, is a group of sentences over which side effects may be relied upon to persist. To the extent that paragraphs are collected together into larger collections (a section, perhaps, or a chapter) the larger collection is a group of paragraphs over which side effects of the paragraph may be relied upon to persist.
Side Effects
That word, “side effects” is also at the center of this language. Terms do not return a value like a function or a method- instead, they transform into something else, and register this transformation with the other terms. In this sense, each term can be thought of as a simple callback if you’re inclined that way, e.g. you are Donald Knuth or someone smarter than me, at least.
Concurrency
When we say the word “concurrent” it is as a counterpoint to the word “sequential.” As such, the statement “A, then B, then C,” is different from the statement “All of A, B, C.” In the former, perhaps B depends upon A, and C upon B. In the latter, each depends upon the other (or none depend upon any, if you prefer to think about it that way, e.g. you are way out there).
Fragments
Last, but not least, a sentence fragment may not be parseable. Think of book titles or recipe ingredients- these are intentionally fragments. When a fragment is not meant to be parsed, this is a way of registering that it will be used subsequently, and it is not considered part of a paragraph. E.g. no side-effects persist, but the fragment itself is kept around for use later in the program. A fragment has an implicit part of speech (shape, but not a named graph, since it has no name).
Lexing
Lexing is the process of looking sequentially through a sentence and chopping it into terms. This process requires a lexicon. Note: lexing, lexicon. The lexer consumes letters until it finds a term. It then passes this term to a parser instance and branches. One branch starts on a new term for that first parser instance, and the other continues to consume letters until it can be certain there is no alternative lexing. If it finds an additional possibility, it passes the term to an alternative parser instance. As such, lexing is deterministic, but it also may yield more than one parser instance.
Parsing
Each parser instance is given a set of terms by the lexer. As they are encountered, each term is looked up in the lexicon, where zero ore more definitions are present. If there are zero definitions present, the term is replaced with itself. If there is one definition present, the term is replaced with that definition. If there are two or more definitions present, the parser chooses which to use by examining the part of speech and seeing which will work in the context of the current sentence. If this cannot be determined, the parser branches again, and both are tried.
Compiling
The process of compiling is essentially going through all the possibilities and following them to the bottom until every single term is replaceable only by itself. Once this has been completed, the entirety of the program is represented by a graph of terms, where the shape of the graph locally is determined by a part-of-speech.