[ I know this isn't very clear. I'm sorry. I'll try to write up something clearer later on, if nobody else does it first. Some of this stuff I'm still working out, and none of it should be considered final. -- Not Dmitriy Myshkin. ]
A Flare element corresponds to an XML element. For example, in:
Example 1:<human>, <arms>, <legs>, <name>, <employer>, <codename>, <head>, <eye>, and so on are Flare elements.
The location of <arms> is the element <human>; <human> contains <arms>.
The content of <arms> is the number 2. The content of <name> is the string "Bob". The content of <employer> is a reference to a certain Flare element (an object of type Company, as it so happens).
The content of <arms> is known to be the number 2, and not the string 2, because <arms> has metadata indicating that it contains numerical content. The interpreter knows to assign <arms> this metadata because the metadata for the Flare element <human> indicates that it is expecting a subelement of type <arms> and that this subelement will have numerical content.
In classic object-oriented terms, the Flare object (element) <human> is known to be of class (metadata) Person, and class Person declares (optional static typing) that Person objects may contain an instance member (subelement) named "arms", which will contain numerical content.
The basic Flare element types are determined by the content each type is expecting.
"Text" and "data" (not listed) are both XML manipulation types. "Text" and "data" are basically like "str", except that substrings within the text or data can have annotation. "Text" behaves pretty much like generalized XML and is the type you'd use to process, say, HTML. "Data" allows the annotation of noncontiguous elements and is what you'd use to, say, perform visual recognition on a field of pixels. More about this later. Text and data may turn out to be subtypes of "string" rather than separate types - it seems quite likely, actually.
"numeric" indicates a number, which may be internally represented as an integer or a floating-point. Externally, <num>10923</num> is an integer, <num>983.0928</num> is a floating-point. The content of <num>6</num> is, again, the non-element contained text "2", which is read by the interpreter as the (integer) number 2.
"reference" is a reference to a Flare element. Internally, this may just be a C++ FlareElementPointer - the elements don't need names unless being displayed to the programmer or stored on disk. Conceptually, though, each element always has a unique identifier, even if most references are "virtual" references that never involve a textual name and simply make use of a direct pointer. If the Flare interpreter needs to create a name on the fly, it must be a name that is not being used by any other Flare element the interpreter knows about - names must be unique. The content of <employer>Company$0x07012933</employer> is "Company$0x07012933", which is interpreted as a Flare reference to the Flare element with that name. References are usually transparent, and in transparent contexts, will appear to have whatever type their referents have.
The "structure" type is the most archetypal object type - Flare structures most strongly resemble objects in C++, Java, Python, Perl, and so on. Elements <human> and <head>, in Example 1, are both structures. Structures do not need to contain non-element text as their content, and even if there are no subelements, the structure still exists - would still "succeed" if tested for truth value. In the above example, both <human> and <head> contain subelements, while <left_ear/> contains no subelements, but all evaluate true and all are structures.
The "list" type is closest to the array type, and indicates that all the subelements (not just subelements declared as multiple) form an ordered set. <ears>, in the above, is a list type; human.head.ears would return the <left_ear/> element. human.head would return an error; there is no inherent ordering of the subelements of human.head. However, subelement <eye> of head is declared multiple, so human.head.eye would return the second eye element, and human.head.eye would be an expression with list type.
The "variable" type is transparent - it has whatever type is possessed by its content. A variable may only contain one contentful element at any given time; anything else will cause an error. The archetypal variable type is the one seen in `var foo = 4`, class "var" will accept subelements of type <num>, <str>, <ref>, <text>, and <data>, plus any planar subelements. The type of a <var> containing a <ref> is the type of the referent, which is how variables can appear to contain structures and lists. <var></var> has variable type and content "void", just as <num></num> has numeric type and content "void". The type "variable" may also be used to indicate an expression with definite but currently unknown type, and so on. I'm not sure whether a variable needs to be able to locally contain object data - it doesn't currently seem like it.
I'm currently trying to figure out whether there needs to be an "expression" type and/or "method" type. There probably does.
You can subclass any type, although subclassing some types may well be considered "Flare abuse", and virtually all subclassings should normally be of type struct. (Technically this is bad phrasing; you can subclass the class <num>, not the type "numeric".)
Flare metadata is complex, but in FlareSpeak, most of the complexity should usually be invisible unless needed. For example, a class with no declared parent is assumed to have type "structure".
Why is Flare complex? The simple answer is so that Flare can do complex things. The complex answer is that a lot of what looks like complication results from separating out concepts that are usually conflated. "Parenting" and "metadata", for example, or "type", "class", and "metadata".
"Type" helps determine how expressions are parsed, how Flare content - in XML terms, the plain text rather than the XML elements - is handled, and so on. "Type" also determines, to a large degree, how expressions behave and even how they are parsed.
"Classes" (and "parenting") are the things that are (usually) defined by a programmer who sits down and writes a class; who works out what the behaviors are, writes instance methods, and so on. Classes correspond to the mental categories in which we place objects.
"Metadata" is class, plus any extra information that may be specified for an instance member.
import AnatomySuppose we have <human><arms>2</arms></human>.
invariant: arms >= 0
The type of <arms> is numeric.
The metadata of <arms> is the metadata of Anatomy.Appendage, plus the invariant "arms >= 0". (Actually, this invariant should have been defined on Anatomy.Appendage, but never mind.)
The Metadata object for <arms>, human.arms^, will have, as parent, the Metadata object for class Anatomy.Appendage, and will add to this parent one extra element describing the invariant. If not for the invariant, human.arms^ would directly yield the Anatomy.Appendage class.
We might say that <arms> has the class Anatomy.Appendage and the metadata for human.arms - and the type numeric, of course.
Parenting is a way for Flare elements to accumulate content in a series of successive overlays, to implement the semantics of "default" and "override" values, and to avoid the duplication of shared content.
Parenting should first be considered shorn of caching and other issues - as if parenting were recomputed on each occasion - in order to more easily define the semantics. If object foo has parent A, and object A has parents B and C, and object B has parents D and E, and object E has parent F which has parent G which has parent H, then the inheritance structure will appear as follows:
fooSuppose that we now evaluate the expression `foo.bar` and no element "bar" is found directly on object "foo". The interpreter (in our non-caching example) goes on to execute a depth-first search of the ancestry; it checks element A for "bar", then element B, then element D, then element E, then F, then G, then H, then C.
The following major issues have just been raised:
There are likely to be two major uses of parenting; first, cases such as class hierarchies, which are likely to change very rarely and be very heavily used; second, use of parenting for Situations and application logic, both cases where the number of updates is likely to be significant relative to the number of uses. The first case should be cached; the second case should computed on the fly. The second case can thus be dismissed for the moment, unless more heavily optimized versions of the interpreter attempt to cache this as well. Regardless, it is a requirement that the cached case should always be equivalent to the uncached case in terms of program behavior, even when parallelization is taken into account.
Caching should be structural; the key information is not the value itself, but a direct pointer to the value. Suppose, in the above example, that we compute `foo.someMethod`, and it turns out that the property "someMethod" actually resides on ancestor B. In the above diagram, foo belongs to AClass, and AClass defines the class parent A, meaning that all objects whose metadata is AClass will have default parent A. A inherits from B, which is the default parent for BClass. Let's suppose that "someMethod" is a method defined in BClass. `foo.someMethod()` will thus invoke the method contained in `foo.someMethod`, which turns out to be located on `B.someMethod`, where B is an indirect ancestor of foo. (We'll talk about binding later, but `self` still evaluates to "foo" even though the method is located on B.)
Again, a lot of issues just went flying by, but the one we're worried about at the moment is caching. There are two obvious options for caching the `foo.someMethod()` access; we can cache on "foo" or we can cache on the class parent "A". I'd vote for automatically caching everything on A, since it means that if you look in A's extended dictionary, and don't find anything, you're guaranteed not to find it on B, C, D, et cetera. This means that any modification of B, C, D, et cetera, must be propagated to all descendant class parents, but such modifications should be rare, and the requirement is tolerable. Thus, almost any access, in the ordinary course of operation in a Flare interpreter, should be resolved by at most two dictionary checks.
As said above, caching should be structural; the key information is not the value itself, but the pointer to the Flare element that contains the value. If instance member "arms" in class Primate has the default value of 2, meaning that the class parent contains the element <arms>2</arms>, then the class parent for Human, which inherits from Primate, should cache the fact that instance member "arms" is stored on Primate - should cache a reference to the element <arms> - but should not cache the number 2. Thus, modifying Primate::arms to have a default value of 3 will not require an update to Human or to any instances of either class. Writing `bob.arms` will first check Bob's dictionary, and fail; it will then check the Human class parent's extended dictionary, and find a reference to the Flare element <arms> stored on Primate, and return the content of <arms>, which is 2 (later, 3). Changes of content thus do not affect parent caching, since what is being cached is not the content, but the location.
(When I say "dictionary", by the by, I'm using a term from Python, but I don't mean that there's a dictionary Flare element somewhere - the dictionary is a hidden attribute of a Flare element and is maintained by the interpreter, probably (in a C++ interpreter) as a C++ object somewhere.)
An ordinary object has one dictionary, a local one. A cached parent may have a local dictionary containing references to local elements, plus an extended dictionary containing references to both local and nonlocal elements. A parent, to be cached, must have exclusively cached parents. Thus, when a parenting search strikes a cached parent, it can halt and need not traverse any further sublinks of that node.
Modifying the class hierarchy is a very dangerous operation and may be optionally prohibited by limited implementations of an interpreter - i.e., implementations that cannot safely support the change propagation action. In parallelized, automatically synchronized versions of Flare, modifying a parent should probably require acquiring a write lock on every extended dictionary that would be affected by the change - before carrying out the action. This may easily turn out to involve bringing all other Flare threads to a halt, which is probably a good idea when managing metadata in any case. If the Causality idiom of the Flare libraries and the Flare interpreter involves achieving a write lock on all effects marked as dependent on an action before carrying out the action, then this would almost certainly involve halting almost all other Flare threads. (Not just halting anywhere, but halting at a safe point rather than mid-action.) (I probably need to write more about the integration of Causality with parallelism, and the integration of Flare-level Causality with interpreter-level Causality.)
Suppose that we have:
<bob id="bob">bobclone wants to override bob's nose but accumulate bob's eyes. This is probably a behavior that should be default for parents - accumulate list members unless otherwise specified; override singular members.
But there are also some more exotic behaviors that parents might want to specify:
<bob id="bob">Suppose Bob now wants to say:
"Even though bob.head (and bobclone.head) are both elements of singular type, rather than bobclone.head overriding bob.head, the parenting relation falls through - bobclone.head accumulates content from bob.head."Well, one option, of course, would be to say: "Forget it. We're not putting that much complexity in the interpreter. Do it yourself."
On the other hand, it doesn't seem like too much of a burden on the interpreter. If a Flare element is a parent, then the Flare subelements can specify - in their metadata - how that parenting should affect them. The possibilities are:
<bob id="bob">The reference "theNose" refers to bobclone.nose, even though no actual <nose> element exists on bobclone. How is this handled?
<bobclone parents="bob" id="bobclone">
str& theNose = bobclone.&nose
bobclone.nose = "hooked"
There are two apparent ways to handle this. The first way is to have the "theNose" reference be indirect - that, rather than theNose pointing directly to a Flare element, theNose should specify that following this reference means going to "bobclone" and looking for property "nose". This costs some extra processing power, but not much. A ten-property chain of indirection will cost ten dictionary lookups per usage, but this may be an acceptable cost. At the least, this kind of indirection needs to be used for cases where the lookup is being overridden by an interception.
The second alternative is to create a placeholder Flare element, one visible only to the interpreter, but one that can still be pointed to directly by a Flare reference. If serialized or written to disk, the reference would automatically generate an indirect reference by moving up the invisible placeholder element until a real element was encountered, then writing an indirect handle starting at that real element. However, within internal interpreter usage, looking at the placeholder element would be a single action not requiring a dictionary lookup. I'm trying to figure out whether using this alternate method could ever possibly change where a reference would end up after a complex action - I don't think it could.
A consquence of this is that object IDs must reserve the character "." for use in indirect references. In fact, object IDs will probably wind up reserving quite a lot of characters - implicit in the idea of a "parents" attribute, for example, is the idea of (comma-separated?) object IDs.
If bobclone has a member "head", which is defined to have a automatically have a parenting relation to bob.head, and the member "head" has the member "face, which is defined to automatically have a parenting relation to bob.head.face, and the programmer writes bobclone.head.face.eyes = 2, then this assignment will automatically create <head>, <face>, and <eyes> if these elements did not previously exist. Similarly, bobclone.&head.&face.&eyes will return a reference whose actual target may change if bobclone.head, bobclone.head.face, or bobclone.head.face.eyes are created.
In some sense, of course, this is Flare abuse and might well trigger a program invariant. But it is still defined to be possible under the parenting semantics, and, if the atomic interactions are well-defined, should not involve additional complexity to implement.
Unlike most object-oriented languages, an instance method in Flare does not have a "self" or "this" argument - even an implicit one. Instead, when a method is invoked, the interpreter keeps track of where that method was invoked from, and the FlareCode element <self/> is an expression which retrieves this location. This appears in FlareSpeak as "if self > 3" (for a method of an object that can have numeric type), or implicitly as ".eyes = 2" (meaning, self.eyes = 2).
The value of <self/> can be temporarily changed by the with expression:
with self%quakeis equivalent to:
.ammo = 3
.weapon = quake_module.weapons.rocketlauncher
self%quake.ammo = 3Even if there's an automatic parenting chain, as in bobclone.head.face.eyes, calling bobclone.head.face.eyes.isBinocular() should still appear to be located on the "virtual" element bobclone.head.face.eyes, and not the actual element where the method was found. In terms of actual location, "isBinocular" would be a subelement of the class parent for Body.Eyes, and would have been found by looking at the <eyes> element in bob.head.face.
self%quake.weapon = quake_module.weapons.rocketlauncher
If you're calling a method, or carrying out an assignment, you need to keep track of the "lvalue", the virtual element location, and not just the actual location. This doesn't mean that every chain of "foo.bar.baz" accesses must behave this way. Only unbroken chains of automatic parenting need to track the virtual and actual locations independently. If "foo.bar" is a reference to another object "bob", with a direct subelement "susie", then foo.bar.susie.isBinocular() will be located at <susie>. Even if "isBinocular" is contained in the class parent, <self> will still yield the <susie> element within that method invocation.
Using <self/>, rather than an implicit or explicit first argument, allows methods to behave intuitively even when overridden. For example, in Python, it is impossible to override a class instance method with a local instance method - if "eyes" is a Python object and class Eyes defines the function "isBinocular(self)", then isBinocular cannot be overridden with another method of the same signature. In Python, if you write "eyes.isBinocular = Eyes.isBinocular", replacing the function with a local copy of itself, calling "eyes.isBinocular()" will cause an error due to an incorrect number of arguments. In Python, unless the function is located on the class and called on the object, Python doesn't know to automatically bind the invoking object to the argument "self". Python's semantics permit you to write "foo = eyes.isBinocular ; foo()", but not to write "eyes.isBinocular = Body.Eyes.isBinocular ; eyes.isBinocular()". Flare, by keeping track of both virtual and actual locations for references through automatic parenting chains, permits you to do both operations - (a) override class methods with local copies, and (b) create references to instance methods that will bind to the object they were taken from. All of this happens without the need to treat methods any differently from ordinary properties.
But how to handle static methods and static properties, then? A static method, when invoked, should see <self> referring to the class parent, and not itself. A static property, when assigned to on a local object, should change the value on the class parent, rather than overriding the parent value with a new local value. If the property "bar" has metadata marking it as static, then the expression `foo = FooClass() ; location(foo.&bar)` will return the class parent for FooClass, rather than object foo. In other words, even in a reference context or an assignment context, writing "foo.bar" will break the parenting chain as surely as following a reference. foo.bar = 3 will change the value on the class parent instead of creating a locally different copy, and foo.bar() (if bar is a method) will have <self/> equal to the class parent.
The "static" case is one of the cases that subelement metadata can declare for handling parenting.
As we've just seen, metadata determines how parenting works.
Parenting also determines a lot of how metadata works.
Parenting is used to implement inheritance between classes. Let's say that class AClass inherits from class BClass, and BObject is an object of class BClass:
MetadataIn this case, -- means "is metadata of", and --> means "is parent of". "Metadata" is the class of class "Metadata". "Metadata" is the class of AClass and BClass. AClass is the parent of BClass. AClass is the class of the "class parent" for AClass, and BClass is the class of the class parent for BClass. The class parent of BClass is the parent of BObject, and the class of BObject is BClass.
Class parent for AClass------->Class parent for BClass
A possible rule: If X is an ancestor of Y, then X's class must be equal to, or an ancestor of, Y's class.
BObject is known to have the parent "Class parent for BClass", even though this information is not explicitly noted, because the metadata for BObject defines a default class parent.
All X in your parentage must have a class which appears in your direct ancestry. Their (cached) class parents therefore appear in your (cached) class parents. Thus, the interpreter only ever needs to check one cached class parent on any lookup; other locally defined X are probably not cached, are checked at runtime, and have their ancestry traced out only to the extent that no class parents are encountered. This is an interpreter optimization which should not change the semantics.
To define a mixin parent - an attachment - that can be freely used with most classes, have it inherit from the basic class for "struct", which is the default parent class when no other is declared.
Do not confuse "parent", "class", "parent class" (AClass is the parent class of BClass), and "class parent" (an automatically defined object that goes along with a class.
One of the most important kinds of metadata is the metadata that defines subelements. I'm not quite sure how this should work, in terms of which subelements of the Metadata object actually provide that data.
1. Metadata's default subelement type should be Metadata, or reference-to-Metadata, so that writing Human.arms gets you the same Metadata as writing human.arms^, i.e., human^.arms == human.arms^. This would be elegant, and parenting would provide a ready-made idiom for overriding Human.arms in derived classes; a class invariant (Flare language rule) would state that, if Human.arms == Body.Appendage, then classes derived from Human that override Human.arms must override it with a direct descendant (in terms of the class hierarchy) of Body.Appendage. I'm not sure how similar semantics would be implemented for method signatures - it might also have to be done by language invariants, there being no really good way I can see offhand to have the whole thing fall automatically out of something elegant.
2. An alternative to having human^.arms equal human.arms^ would be to let subelements of type <property> and <method> have list type and accumulate across parents, so that if AClass is the parent of BClass, then BClass inherits all subelements of type <property> along with any properties it may have defined locally. To get the metadata for "arms" given the metadata for "human", you would actually have to call Human.getElement("arms").
The first method provides a very elegant idiom for examining metadata. Unfortunately, it also seems to involve an infinite recursion problem if you try to let metadata contain anything except subelement metadata. Suppose we want to define, in the Metadata class, an element called "profile_usage" which counts the maximum number of objects that have so far been created as members of this class.
Does this mean that the element "profile_usage" is forever reserved, and prohibited to all other classes? profile_usage can no longer be an element of class Metadata; it is now an element of type "num". So if any other class tries to define a subelement called profile_usage, it will be impossible to store. Human.profile_usage already has a meaning. Furthermore, since Metadata itself has class Metadata, how could the Metadata class define a subelement for profile_usage? The metadata for class Metadata says that profile_usage is a number, not a subelement of Metadata!
There are two ways out of this infinite recursion. One is to have cumulative lists of <property> (and/or <method>) elements, and to forego the human^.arms idiom. The other way is to try and distinguish between elements like "profile_usage" and "arms" in some other way. Trying to make "profile_usage" the special case, through naming conventions or whatever, always fails, so we pull out a trick from Creating Friendly AI (yes, really) and make subelement declarations the special case. The actual storage of human.arms^, instead of looking like:
<Human>will look like:
<invariant>... self > 0 ...</invariant>
<Human>How would one access Human.arms, then? Perhaps the Metadata class will define a "find unfound property" interception which will translate Human.arms into Human.("m-arms"). Or there might be other ways, such as the foo.*baz "intelligent binding" idiom I'm still thinking about. Maybe Human.m-arms should even be legal Flare, although in this case, a better solution would be to make the naming convention m_arms instead.
<invariant>... self > 0 ...</invariant>
The distinct advantage of this system over the one that uses <property> elements is that it provides a more natural idiom for overriding subelements with new versions - it can take advantage of the language idiom of parenting, rather than needing to rely entirely on language-defined logic. As a disadvantage, it introduces a bit of a special case into the access idiom and linkup, thus "breaking the chain" and making it harder for Flare code that understands Flare to see the link, but no more of a special case than would be introduced by using <property> elements, since you can still have a Human.getSubelement("arms") that finds Human.("m-" + arg).
If anyone can think of a more elegant way of doing all this, please let us know.
My current working assumption is that local objects have a "parents" attribute - not a subelement, but an attribute - containing a list of parents, and that this is represented in XML as an XML attribute containing a comma-separated list of IDs (unless there's a known XML idiom for lists of IDs that'd work better). Metadata also specifies the default parent for members of that class, known as the "class parent". If there's any data that needs to be stored on a parent, such as an instance method (or, for that matter, a static method), then the class parent is automatically created by the interpreter and a reference to it is stored in the metadata - perhaps in the "default_parent" member.
Changing the "default_parent" property on the metadata may cause huge amounts of simultaneous breakage, and need be allowed only by the most extreme of Flare implementations.
Probably not - or, if so, it will be limited to dynamic, uncached parenting and will be a Flare feature declared as dangerous and advanced. Objects might be able to define their own delegation semantics for unfound properties, and have that fit in with the way the Flare interpreter tracks virtual and actual locations for references, but if so, it will be "delegation" - not "parenting".
See "PEP 253: Subtyping Built-in Types".
Suppose that A is the parent of B and C, and both B and C are parents of D. C overrides one of A's methods, e.g. "foo()". If D inherits from "B and C", will a depth-first search hit "foo()" on A, through B, before it hits "foo()" on C? If so, this is an unexpected behavior and C will probably break because it's relying on the new method being implemented.
PEP 253 explains this issue in much greater depth. The resolution, quoted from there, is as follows:
The new lookup rule constructs a list of all classes
inheritance diagram in the order in which they will be searched.
This construction is done at class definition time to save time.
To explain the new lookup rule, let's first consider what such a
list would look like for the classic lookup rule. Note that in
the presence of diamonds the classic lookup visits some classes
multiple times. For example, in the ABCD diamond diagram above,
the classic lookup rule visits the classes in this order:
D, B, A, C, A
Note how A occurs twice in the list. The second
redundant, since anything that could be found there would already
have been found when searching the first occurrence.
We use this observation to explain our new lookup
rule. Using the
classic lookup rule, construct the list of classes that would be
searched, including duplicates. Now for each class that occurs in
the list multiple times, remove all occurrences except for the
last. The resulting list contains each ancestor class exactly
once (including the most derived class, D in the example).
In Flare, the rule can be used easily enough for cached parents such as class parents. Other, dynamic uses of parenting are less likely to run into this kind of situation, and can probably get by with depth-first checking.
This also resolves the possible issue with cumulative list elements appearing twice in cases of multiple inheritance.
Hm, it looks like I went through practically everything needed to describe "subelements" already, in the above section. A lot of it was pretty much mentioned in passing and should probably be summarized in one place, though.
A subtype is, conceptually, a type plus additional behaviors; the distinguishing characteristic of subtypes is that they affect how the interpreter represents the type content. For example, "integer" is a subtype of "numeric", but one which the interpreter may wish to represent internally as an integer rather than as a floating-point number. "bignum" is a subtype of numeric. "complex" and "rational" may best be represented as application or library subclasses with numeric type, but it would be possible to represent them as Flare numeric subtypes, and have them handled directly by the interpreter, if necessary.
Subtypes of "reference" include hard references (the default) and soft references. A hard reference is one which prevents an object from being deleted while the hard reference is active, or causes an error if an object with stack or dynamic scope is deleted while the hard reference is active. A soft reference is automatically set to null when its target is deleted, and does not prevent garbage collection of the referenced object.
Subtypes of "list" may include a hint to the interpreter as to whether the list structure should be represented as an array, a linked list, a two-way linked list, and so on. (The list elements will still be Flare elements, but the different structures have different properties for growth, shrinkage, access, deletion, movement, and so on.) Initial versions of the interpreter should almost certainly use arrays universally, but there is the potential for later optimization. The access idioms for the different list subtypes should remain precisely identical, and the program behavior should remain precisely the same - except in terms of speed and resource usage - if the interpreter opts to use a different representation than the one recommended. Other subtypes of lists may include lists with prespecified maximum length, or prespecified hard length, which are optimization notes that should also be treated as enforceable invariants (regardless of whether the optimization itself is used).
Proposed subtypes of "string" include the "text" type and the "data" type.
Usually, Flare strings are strictly content - no annotations may be made on them. Usually, Flare elements are strictly contained in other Flare elements. The XML text type permits annotations within strings, thus enabling Flare programs to process arbitrary XML, or XML-like structures such as mundane HTML. Within a text subtype, it is possible to write, for example:
text theText = "The Flare language"And so on.
=> The <a href="flare.html">Flare</a> language
=> <i>The <a href="flare.html">Flare</a> language</i>
=> <i>The <a href="flare.html">Flare</a> language.</i>
=> <i>The <a href="flare.html">Flare</a> language</i>
=> <i>The <a href="flare.html">Flare</a> language</i>.
for$ item in text.getAnnotations()
=> <a href="flare.html">
text.getAnnotations()@.extra = 5
=> <i><extra>5</extra>The <a href="flare.html"><extra>5</extra>Flare</a> language</i>.
There are two kinds of annotations you can make to text - insertion-point annotations, which do not contain any actual text content, and text annotations that actually enclose text. As always, in Flare, annotations can have annotations. How to distinguish between an annotation on an annotation, and an insertion-point annotation? In other words, how do you know whether:
foo<a><b></b>bar</a>means that there's an annotation <a> on "bar", with subelement <b>, or whether it means that there's an annotation <a> on "bar" and an insertion-point annotation <b> between "foo" and "bar"? In this case, it's easy to resolve the ambiguity for most cases; a <b></b> element containing no text is a subannotation if and only if it is immediately to the right of the opening tag of another annotation, in which case it is a subelement of that annotation. If we wanted to have an independent insertion-point annotation, we would have written:
foo<b></b><a>bar</a>Thus, foo<a><b>bar</b></a>baz is text where "bar" has two independent annotations, <a> and <b>. foo<a><b></b></a>bar is text with an insertion-point annotation <a> between "foo" and "bar", and a subelement <b> of <a>. foo<a></a><b></b>bar is text with two independent insertion-point annotations, <a> and <b>, between "foo" and "bar".
There are some remaining ambiguities. First, given the Flare element <text><a></a>foobar</text>, is <a> an insertion-point annotation at the beginning of the text, or is it a subelement of the Flare element <text>? Second, given the text <a><ref>URL$0x02651273</ref>bar</a>, is "URL$0x02651273" part of the text, or is it the content of the Flare reference <ref>? The latter ambiguity, in particular, seems resolvable only by reference to metadata; when text annotations behave like Flare elements, they should either have no content (in which case they may span text), or should be insertion-point annotations or subelements of other annotations.
Suppose, though, that one wishes to create a text class capable of universally representing any well-formed XML, not just the subset of well-formed XML usually used in Flare? On the face of it, this seems easy enough; declare a class TextXML with no subelements - none, including planes - and make the default type of any subelement TextXML. However, the semantics of the Flare language would still appear to distinguish between Flare subelements in the content of a <text> element, and annotations on the <text> element itself. I can't think of a good way offhand to handle this, although I can easily think of some lousy ways, including having an invisible-to-the-programmer placeholder character in any <text> element, say '#', which would separate Flare subelements from text annotations, regardless of whether the text annotations or the Flare annotations actually existed - even a completely empty <text> element would contain '#' and the '#' would always be ignored. This solution is extremely ugly and might easily break Flare's interaction with other XML apps; on the other hand, it might work. The prospect of not being able to annotate an element, however, is even scarier - "Oh no! A special case!"
General representation of all well-formed XML will also require some language support beyond the usual Flare elements, such as the ability to note on a case-by-case basis whether a subitem "foo" is the subelement <foo></foo>, the subelement <foo/>, the attribute foo="", or the attribute foo=''. It is conceivable that some Flare implementations will even offer the ability to represent PIs (1) and the other bells and whistles defined in XML, but this is more of a special case.
Within the text XML type, unlike the data type, text is contained within annotations, and the annotations are supposed to conform to the normal XML semantics. If you move text out from inside an annotation, it doesn't take the annotations with it, and acquires whatever annotations are in effect at its new location; if you slice "foo" out of the middle of some underlined text, and put it in the middle of some italicized text, it loses the underlining and gains the italics. Because text is contained within annotations, the text subtype follows the XML convention that there is no good way to represent <a>Fl<b>ar</a>e</b> - or, in general, <a>***<b>***</a>***</b>.
The data subtype involves a different idiom for annotating strings. The purpose of the data subtype is not to represent general XML, and thus there is much less reason to worry about attributes, PIs, and so on. (In fact, the true secret purpose of the data subtype is to help represent "moxels" in sensory modalities and perform multilevel feature extraction on them.) Within the data subtype, annotations can be applied to arbitrary sets of nonconnected characters, or arbitrary sets of other annotations, regardless of whether the annotated content is contiguous. An AI application engaging in Flare-interpreted visual recognition - yes, there are conceivably good reasons for doing this in Flare instead of C or assembly - would thus be able to apply annotations to arbitrary pixels.
Thus, optimizations to think about for data subsubtypes would be permitting representation of data as a two-dimensional array, or representing data as an N-dimensional array containing only a few nonempty elements, and so on. It might also be good to allow for a data subtype representing an array of 32-bit or 64-bit integers, rather than a data subtype representing an array of Unicode characters. Such data subtypes should still be annotable.
If you move around characters (or integers) within a data subtype, any annotations they have, follow them. There are no "contiguous" annotations for the data type, as there are with the text XML type - if annotations are contiguous, it is by coincidence. Each "pixel" in the data has an independent list of annotations; if pixels (characters) are moved, they take their annotations with them.
The data type is perhaps the only type in Flare that will require hard thinking to represent as XML. It is acceptable if data types in Flare are mostly opaque to other XML-processing applications, or if data types use idioms that are unique to the data type - for example, representing a comma-separated list of integers as the elements of the array, or writing all the annotations with unique local IDs (that is, unique to a given data element only) and then using those (short) local IDs as the semicolon-separated list of annotations on each character or integer... and so on.
The proposed "data" subtype is inelegant. It is the result of needing to write introspective code that can annotate and meta-annotate noncontiguous regions of data fields. Flare's underlying XML-oriented representation is not very well suited to this, but a nonintrospective, nonannotative language would be even less suited. A possible solution would be to declare the data type to be a standard Adapted Flare extension, meaning that interpreters do not, by default, need to support it. In fact, on consideration, it appears almost certain that the "data" subtype should be an Adapted Flare extension - albeit one requested by the Singularity Institute and hopefully implemented by our supporters in the open-source project.
Within the interpreter's internal representation of strings, strings are Unicode and can easily contain '&' or '<' characters. When written to disk, serialized over the Internet, and so on, '&' becomes '&' and Unicode becomes UTF-8. This is pretty much standard.