.
[ES4Last] Last public draft of ECMAScript 4 proposal, 30 May 2008.
Conventions
===========
Text in a subsection following "Rationale:" is non-normative;
that is, it does not impose any conformance requirements on
Jacaranda implementations or programs. The same programming
language would be specified if this text were removed. Text
surrounded by [# #] is a non-normative editing note.
Definitions of technical terms used in this specification should
be taken from the first of the following documents that provides
a definition for the term:
1. This document.
2. [ES5Draft].
3. The Unicode Standard, version 5.1.0.
Defining occurrences of terms in this document are shown like
this: *term*.
A *string* is a sequence of zero or more UTF-16 code units
(not necessarily well-formed).
In this specification, strings are often presented in
single quotes; this has the same meaning, including use of
\u escape sequences, as the corresponding string literal in ESF.
Terms applied to strings such as *starts with*, *ends with*,
etc. are defined in terms of this code unit sequence,
regardless of whether it is a well-formed UTF-16 encoding.
For example, '\uD800__' is considered to end in '__'.
Code fragments that appear "inline" in the text are surrounded by
double angle quotation marks: « ». In general, these fragments
should be interpreted as ESF code; they may or may not be valid
in the Jacaranda sublanguage. In code fragments, \ stands for itself.
Grammar productions that are referenced "inline" in the text are
given in single angle brackets: < >.
\u escapes are used in grammar productions to mean the corresponding
unescaped code unit. An optional occurrence of a production is
shown with a trailing '?'. Alternatives of a production are
separated by '|'. There is no implied priority between alternatives.
Productions that do not correspond to an [ES5Draft] production
are preceded by '+'.
There are three grammars used in this specification:
- In the lexical grammar, productions are named with an initial
uppercase letter, and :: is used to separate a nonterminal
from its definition. The lexical grammar is an S-attribution
of the ESF lexical grammar. The input is a string, and the
output is a sequence of tokens, each of which is a string
that may have attributes.
([ES5Draft] specifies some characteristics of the ECMAScript
lexical grammar only informally. It is possible to use the
attribute grammar formalism to specify the required tokenisation
very precisely. We do not do this in the current draft because
it would add apparent complexity to the Jacaranda specification
that would only be concerned with clarifying ESF, not with
defining how Jacaranda is a sublanguage of ESF.)
In lexical rules, a string delimited by single quotes denotes
the corresponding sequence of input code units.
- In the syntactic grammar, productions are named with an
initial lowercase letter, and : is used to separate a nonterminal
from its definition. The syntactic grammar is technically
an LR-attribution of the ESF syntactic grammar (it would
be an S-attribution without the Options attribute).
Its input is the token sequence obtained from the lexical
grammar, and its principal output is an 'errors' attribute
giving the set of errors (if any), other than parse errors,
that would prevent the source text from being a valid
Jacaranda program.
- In the string grammar, ::: is used to separate a nonterminal
from its definition. The input to the string grammar is a
sequence of 16-bit code units. It is a regular grammar that
computes no attributes and is only used to recognise
classes of strings.
Lexical and syntactic grammar productions that do not require
an explicit attribution rule but are directly used by a production
that does, are shown with a reference to the [ES5Draft] section number.
The symbol ε in a production means the empty input sequence.
The syntactic grammar can be parsed unambiguously, independently
of attribute computation. That is, while attribute values affect
whether errors result from a given parse, it is never necessary to
compute attributes in order to choose between different parse trees.
(In a small number of cases, as part of processing a production's
attribution rules, the node must be reparsed against a more specific
production.)
It is possible to implement a parser for ESF that is LR(2), i.e.
uses LR parsing with two-token lookahead. The grammar given in
this specification, like that in [ES5Draft], is not LR(2) because
it is not left-factored.
The metalanguage used to define attribution rules has values of the
following types:
- true and false
- arbitrary-precision decimal numbers
- the undefined attribute value
- finite sets of other attribute values
- finite sequences of other attribute values
- finite mappings from attribute names to attribute values.
A parse tree node is represented as a finite mapping of its attributes.
All nodes have a 'text' attribute holding the corresponding source
text as a string.
A convention is used to present attribution rules more concisely, by
making computation of "boilerplate" values for attributes implicit:
- If a production has exactly one child node, or a child
node annotated with a trailing ^, then its boilerplate
attribute values are given by the attributes of that child.
- Otherwise,
- for attributes whose name is a plural noun (which are set-valued),
the boilerplate value is computed by taking the union of the
corresponding attribute values on all child nodes;
- for attributes with names starting 'is' or 'has' (which are
boolean-valued), the boilerplate value is false;
- for other attributes, the boilerplate value is the undefined
value.
Boilerplate values are only used for attributes for which no
explicit definition is given, or when 'U=' is used as described below.
This convention is to be used for _all_ syntactic and lexical grammar
productions, including those that are used by ESF but not explicitly
mentioned in this document.
Notation and auxiliary functions used in attribution rules:
// comment a comment that is non-normative and has no effect
N.foo the value of attribute 'foo' of node N
N is P true iff N.text matches production
(grammatically, by reparsing with attribute
rules ignored)
N isnot P not(N is P)
N ispresent not(N is '')
not(b) true if b is false; otherwise false
{ x <- A | p(x) } the subset of values x taken from A for which p(x)
UNION S the union of all sets that are members of S, or
undefined if any of the members are undefined
x ∈ A true if x is in set A; otherwise false
x ∉ A not(x ∈ A)
A U B UNION {A, B}
A \ B the set difference { x <- A | x ∉ B }
S ++ T the concatenation of sequence S followed by T
A U= B define A to be the union of its boilerplate value
and B
S startswith T true if S is not undefined and sequence S
is a prefix of T; otherwise false
X == Y true if X and Y are identical attribute values;
otherwise false
cond ? X : Y if cond is true then X, otherwise Y
max(X, Y) X > Y ? X : Y
min(X, Y) X < Y ? X : Y
match N as P reparse N as production
, possibly binding
additional variables in the attribution rule.
Evaluate to true if N can be parsed as
,
else evaluate to false and bind variables
declared in
to undefined.
match N as P ... like "match N as P", but match an initial token
subsequence of N.
// the following definition uses Haskell-style pattern matching on sequences:
let errorIfMultipleDeclaration([]) = {},
errorIfMultipleDeclaration([x] ++ xs) =
{ err(id, "multiple declaration")
| id <- x.looseNames ∩ UNION { y.looseNames | y <- xs } }
U errorIfMultipleDeclaration(xs);
let errorIfMultipleDeclarationOf(id, x)
= errorIf(id ∈ x.looseNames, id, "multiple declaration");
let errorIfLineTerminatorBetween(t1, t2)
= error(t2, "LineTerminator not allowed here"),
if there is a LineTerminator between t1 and t2
= {}, otherwise
let errorIf(cond, arg, msg) = cond ? error(arg, msg) : {};
let error(arg, msg) = { err(arg, msg) };
let err(arg, msg) =
representation of an error with auxiliary information 'arg'
(an identifier or node) and message 'msg'
let NumberToString(x) =
the result of applying ToString as defined in [ES5Draft] section 9.8.1
to "the number value for x" as defined in [ES5Draft] section 8.5.
Any conformance requirements specified in the normative text are
intended to be redundant with, i.e. specify precisely the same
restrictions as, the Jacaranda lexical and syntactic grammars. If this
is not the case then it indicates an ambiguity in the specification
that needs to be clarified.
The descriptions of Jacaranda restrictions relative to ESF are given
in groups, each with an uppercase name.
Attribution rules given in one group may also impose constraints
needed to implement the rules in other groups, in which case
cross-references are given in most cases.
Rationale:
Analogous to type declarations, some degree of redundancy in a
specification is useful in order to provide a better chance of
detecting specification errors. This is the reason for providing
informal descriptions that are redundant with the attribute grammar.
Conformance
===========
The terms *SHALL*, *SHALL NOT* or *SHALL ONLY* are used to
describe constraints on a conforming implementation of Jacaranda.
The terms *MUST*, *MUST NOT* or *MUST ONLY* are used to
describe static constraints on a module definition (see below).
This indirectly specifies that a conforming implementation of
Jacaranda SHALL reject module definitions that fail to satisfy
such a constraint.
The terms *SHOULD*, *SHOULD NOT* or *SHOULD ONLY* are used to
describe a recommendation for Jacaranda code to follow, in order
to mimimise the risk of potential incompatibilities with future
versions of this specification or implementations of ECMAScript.
It has no formal consequence for conformance.
For any constraint that is described as implementation-defined,
a conforming implementation SHALL make a consistent choice
and SHALL document which choice is made.
An *ESF interpreter* is an instance of an ESF implementation
conforming to the ECMA-262 5th edition draft amended as described
in the section "[ES5Draft] -> ESF".
A *context* is an ESF execution environment that has copies
of the standard built-in objects independent of other contexts
in the same ESF interpreter. Contexts may share other objects.
(In typical web browser implementations of JavaScript/ECMAScript,
there is usually one context for each frame or iframe.)
A *visible side effect* is an effect that changes the state of
one or more objects in an ESF interpreter in a way that
is observable to other code running in that interpreter by overt
means (that is, without making use of timing of events where this
timing is nondeterministic). Allocation of memory is not by itself
considered to be a visible side effect.
Jacaranda programs are organised into *modules*. Typically, these
are independently developed units of source code.
A *module definition* is a string, representing Jacaranda
source code intended to define a module. This takes the form
of a call to a function called $module. The last argument
passed to $module is a function expression defining the
*module function*, which can contain arbitrary Jacaranda code.
An *internal module representation* is a representation of a
module as a collection of values in a given ESF interpreter.
An instance of a Jacaranda module is called a *caplet*. The creator
of a caplet must provide an *environment* and a *powersource*.
The environment specifies *imports* that will be in scope throughout
the caplet's module code. The powersource is an object that will
only be initially available to the caplet's *powerbox*, which is
the function returned by its module function. The caplet will have
no overt authority other than that granted via its environment and
its powersource, and the ability to perform pure computation.
(Jacaranda does not attempt to prevent a caplet from denying
computational service to other caplets.)
If a caplet is created by the Jacaranda implementation or by a
container, then any of its imports that are callable objects SHALL
be simple functions.
A *module name* is a string, representing the name of a module.
A module definition specifies a list of *option strings*, which
can specify variations in the behaviour of the Jacaranda interpreter
and/or verifier for that module. If an option string starts with
the character '!', it indicates an *important option*, which must
be recognised by the Jacaranda interpreter in order for the module
to be accepted, and by the verifier in order for the module
definition to pass verification. Other options are ignored by the
interpreter and verifier if they are not recognised.
A *Jacaranda interpreter* is an ESF interpreter with the
following additional characteristics:
- an interpreter has a per-context state consisting of a mapping
from module names, to internal module representations of
modules that have been accepted in that context.
- it is possible to create a context with an initially
empty mapping.
- it is possible to submit a module definition to a given
context, with the following effect:
- if the module definition violates any of the
constraints given in sections "Module Definitions" and
"Static Constraints" below, a *static rejection error*
SHALL be reported.
- if the module definition specifies an important option
that is not supported by the interpreter, then an
*unsupported important option error* SHALL be reported.
- if the module definition does not give a 'name'
property that is different from any module name
already in the mapping, a *module name clash error*
SHALL be reported.
- the implementation may, for any or no reason, report
an *implementation failure* when an attempt is made to
submit a module. Such a failure SHALL be distinguishable
from other errors.
- otherwise, a mapping from the module's name to its
internal module representation SHALL be added to the
context's state.
In any of these cases, there SHALL be no other visible
side effects of submitting the module definition.
- it is possible to create a caplet by *instantiating* a
named module, providing an environment and a powersource
as discussed above.
- the *actual import environment* of the caplet is obtained
by reading the public properties of the provided environment
object at the time of instantiation, augmented with the values
specified in the "Module Run-time Environment"
section, the latter taking priority over the former.
- if the caplet's module function has any formal parameter
whose name is not the name of a public propery of the
caplet's actual import environment, then an
*unavailable import error* SHALL be reported.
- the implementation may, for any or no reason, report
an *implementation failure* when an attempt is made to
instantiate a caplet. Such a failure SHALL be distinguishable
from other errors.
- the caplet's module function is called, with the properties
of the actual import environment providing the correspondingly
named parameters and free variables of the module function.
- if the module function throws an exception, this exception
SHALL be reported to the instantiator of the caplet.
- if the return value of the module function is callable, then
that object SHALL be called with the given powersource object
as its only parameter. (Note that the return value may be a
function that has captured variables from its surrounding
scope; consistent with lexical scoping, these refer to the
actual import environment of the caplet.)
- the function given by the 'powerbox' property may interact
with the environment of the Jacaranda implementation in
order to set up *event handlers* which cause functions
created by the module to regain control at some later
point in time. The specifications of such event-handling
features are implementation-defined.
In cases where the above description says that an error
SHALL occur as a result of submitting or instantiating a
module, other visible side effects SHALL NOT occur.
At any time, and for any or no reason, a Jacaranda implementation
may *lock* a given context, in which case it SHALL NOT visibly run
any further code in that context.
Other aspects of the interaction between a Jacaranda interpreter
and its environment are implementation-defined. (One possibility is
that a *container* is written in ESF code to control this interaction.)
Rationale:
The idea of using different conformance terms to indicate constraints
on the implementation vs constraints on programs is borrowed from W3C
specifications.
The term "lock" is named after the LOCK utility in the
Incompatible Timesharing System:
.
The allowance for implementation failures makes Jacaranda particularly
easy to implement:
#include
int main(int argc, char **argv) {
puts("Implementation failure.\n");
return 1;
}
More seriously, there are good reasons to allow nondeterministic
failures, given that a practical Jacaranda implementation has no
control over what memory or execution resources are available, and
machines with unbounded resources do not exist. The requirement for
these failures to be distinguishable from other errors is actually
quite strong.
[ES5Draft] -> ESF
=================
Start with [ES5Draft].
Change the productions for a variable declaration statement to allow
«const» to occur in place of «var». Also allow «const» before catch
variables and function parameter formals. («const» is a
in ES3/5, so it already cannot occur as an identifier.) The detailed changes
to these productions are given in the DECLARATIONS, TRY_CATCH, and FUNCTIONS
groups, but are mentioned here because they extend the language rather than
restricting it.
Section 2 Conformance
- Replace
A conforming implementation of this International standard shall
interpret characters in conformance with the Unicode Standard,
Version 3.0 or later, and ISO/IEC 10646-1 with either UCS-2 or UTF-16
as the adopted encoding form, implementation level 3. If the adopted
ISO/IEC 10646-1 subset is not otherwise specified, it is presumed to
be the BMP subset, collection 300. If the adopted encoding form is
not otherwise specified, it presumed to be the UTF-16 encoding form.
with
A conforming implementation of ESF shall interpret characters in
conformance with the Unicode standard, version 5.1.0 or later (see
section 6 for further detail).
- The following paragraph does not apply to ESF:
A conforming implementation of ECMAScript is permitted to provide
additional types, values, objects, properties, and functions beyond
those described in this specification. In particular, a conforming
implementation of ECMAScript is permitted to provide properties not
described in this specification, and values for those properties,
for objects that are described in this specification.
Rationale:
Permission to provide arbitrary implementation-defined extensions
is not appropriate for a security-focussed language.
In the case of ESF, some kinds of extension can make it impossible
to define a secure sublanguage by static restriction. For more
detail see the constraints on extensions given in the changes to
sections 8.6.2, 15, and 16 below.
Section 3 References
- Replace the reference to Unicode version 3.0 with references to
Unicode version 5.1.0.
Rationale:
The conformance requirements for Unicode have been significantly
tightened since version 3.0. In particular, there were security-
relevant changes at around versions 3.2 and 4.0. Note that this does
not require support for characters added in later versions.
Section 6 Source Text
- Replace the section before "Throughout the rest of this document" with:
ESF source text is represented as a sequence of code units in the UTF-16
transformation format of Unicode, version 5.1.0 or later. The text should
be in Unicode Normalization Form C as described in Unicode Standard Annex #15.
An ESF implementation SHALL NOT perform any normalisation of source text
itself (but may warn if the text is not in Unicode Normalization Form C).
This does not preclude an ESF source text from being stored or transmitted
in other character encodings, but the process by which these encodings are
converted to UTF-16 is outside the scope of this document.
- delete NOTE 1, and change "NOTE 2" to "NOTE".
Section 7.5 Tokens
- In ESF, «/*const*/» and «/*fallthru*/» are also tokens, but are not included
in the production.
Section 8.6.2 Internal Properties and Methods
- Replace
Host objects may implement these internal methods with any
implementation-dependent behaviour, or it may be that a host object
implements only some internal methods and not others.
with
A *Jacaranda-accessible property* is a property that has an accessible name
as defined in group PROPERTY_NAMES.
A value is *built-in-reachable* iff it is reachable in the
sense of garbage collection terminology, given that only the
properties listed in ESF_BUILTINS of the global object of any
context, or the Jacaranda-accessible properties of any other
reachable object can be obtained.
ESF_BUILTINS = {
'undefined', 'Infinity', 'NaN',
'Array', 'Boolean', 'Decimal', 'Math', 'Number', 'RegExp', 'String',
'Error', 'EvalError', 'RangeError', 'ReferenceError', 'SyntaxError',
'TypeError', 'URIError'
'decodeURI', 'decodeURIComponent', 'encodeURI', 'encodeURIComponent',
'isFinite', 'parseFloat', 'parseInt'
}
An ESF implementation SHALL NOT add Jacaranda-accessible properties
other than those specified in [ES5Draft] to an object that is at any
time built-in-reachable, except as described in Section 15 below.
A host object may have observably different semantics to any
possible native object only if it is either not built-in-reachable,
or if those semantics can affect only properties that are not
Jacaranda-accessible.
Section 8.7 The Reference Type
- Replace the last sentence of section 8.7 with:
The abstract operations GetValue and PutValue are used in this specification to
operate on references. The definition of GetValue is dependent on an 'Attach'
flag associated with each function or program.
An ESF implementation is not required to support the change to the semantics of
GetValue (relative to ES5) described here. If it does not, then the Attach
flag SHALL be false for all functions and programs.
If it does support this change, then the directive "use attach"; in the
directive prologue of a function or program SHALL set the Attach flag to
true for that program or function (including all nested functions), in the
same manner that "use strict;" enables strict mode.
All uses of the GetValue operation in [ES5Draft], except for the use in section
11.2.3 (Function Calls), should be considered to pass as a second argument, the
value of the Attach flag for the function or program being evaluated.
A function object created by step 6 of the GetValue operation below is called an
*attached function object*.
- Replace section 8.7.1 with:
GetValue (/V/, /Attach/)
1. If Type(/V/) is not Reference, return /V/.
2. Let /base/ be the result of calling GetBase(/V/).
3. If IsUnresolvableReference(/V/), throw a ReferenceError exception.
4. If IsPropertyReference(/V/), then
a. If HasPrimitiveBase(/V/) is false, then let /get/ be the [[Get]] internal
method of /base/, otherwise let /get/ be the special [[Get]] internal method
defined below.
b. Let /R/ be the result of calling the /get/ internal method using /base/ as
its *this* value, and passing GetReferencedName(/V/) for the argument.
c. If /Attach/ is false or if /R/ is not a callable object that could refer to
its «this» value, then return /R/.
d. Let /F/ be a new function object as specified in 13.2, such that the new
function throws a ReferenceError exception if its «this» is bound to any
other value than /base/, but otherwise acts identically to the function
given by /R/.
e. Return /F/.
7. Return F.
5. Else, /base/ must be an environment record.
a. Return the result of calling the GetBindingValue (see 10.2.1) concrete
method of /base/ passing GetReferencedName(/V/) and IsStrictReference(/V/)
as arguments.
For the purpose of step 4.c, a native function object "could refer to its «this»
value" if and only if its function body, excluding nested function bodies, contains
the keyword «this». Whether a built-in function or callable host object "could
refer to its «this» value" SHALL be determined conservatively.
The following [[Get]] method is used by GetValue when /V/ is a property reference
with a primitive base value. It is called using /base/ as its *this* value and
with property /P/ as its argument. The following steps are taken:
1. Let /O/ be ToObject(/base/).
2. Let /desc/ be the result of calling the [[GetProperty]] internal method of /O/
with property name /P/.
3. If /desc/ is *undefined*, return *undefined*.
4. If IsDataDescriptor(/desc/) is *true*, return /desc/.[[Value]].
5. Otherwise, IsAccessorDescriptor(/desc/) must be true so, let /getter/ be
/desc/.[[Get]].
6. If /getter/ is *undefined*, return *undefined*.
7. Return the result of calling the [[Call]] internal method of /getter/ providing
/base/ as the *this* value and providing no arguments.
NOTE The object that may be created in step 1 is not accessible outside of the
above method. An implementation might choose to avoid the actual creation of the
object. The only situation where such an actual property access that uses this
internal method can have visible effect is when it invokes an accessor function.
Rationale:
The motivation for changing the semantics of GetValue is explained in the rationale
for group EXPRESSIONS. (Briefly, it allows removing restrictions on how the result
of a property access can be used.)
The "use attach"; syntax is consistent with that of "use strict";.
It is possible for the Jacaranda runtime to detect whether "use attach";
is supported by attempting to call a function that would be attached,
for example:
function isUseAttachSupported() {
"use attach";
var test = {method: function() { return this; }};
var m = test.method; // should attach to test
try {
m();
return false;
} catch (e) {
return e instanceof ReferenceError;
}
}
When GetValue creates a new attached function object, it does not
automatically cache this function object in the original property. That would
not be difficult to specify, but it would not be a transparent optimization,
because:
a) the base object B (Result(2)) may be used as a prototype for some other
object, O. In that case, a lookup of the property in O could find the
cached function that is attached to B, when it should find an unattached
function that will be later attached to O.
b) suppose that an unattached function is found in some prototype P of B, and
later the property is mutated in some object in B's prototype chain, up to
and including P. Meanwhile an attached version of the original function has
been cached in B. For the caching to be transparent, subsequent lookups
should find the mutated property value, not the cached value. There is
no obvious way to ensure that the cache is invalidated in this case.
c) code that is not using the attachment semantics should not be able to see
cached values.
These problems are not insoluble, but in combination they make the caching
optimization too complicated to be worthwhile.
Versions of Jacaranda before 0.4 used automatic binding (a function automatically
uses the "correct" this) rather than attachment (a function that would use the
"wrong" this instead throws an exception). Attachment may be more politically
acceptable because it ensures that the restricted language is a fail-stop subset
of ECMAScript 5.
Section 11.2.3 Function Calls
- In step 3 of the algorithm, replace
3. Call GetValue(Result(1)).
with
3. Call GetValue(Result(1), false).
Rationale:
Without this change the semantics would be the same, but the algorithm would
unnecessarily specify the generation of an attached function for every call to
a function that refers to «this».
- Add after the NOTE:
An ESF implementation SHALL NOT return a value of type Reference as a
result of calling a function that is built-in-reachable.
Section 15 Native ECMAScript Objects
- Replace
Unless otherwise specified in the description of a particular function,
if a function or constructor described in this section is given more
arguments than the function is specified to allow, the behaviour of the
function or constructor is undefined. In particular, an implementation
is permitted (but not required) to throw a TypeError exception in this
case.
NOTE
Implementations that add additional capabilities to the set of built-in
functions are encouraged to do so by adding new functions rather than
adding new parameters to existing functions.
with
Unless otherwise specified in the description of a particular function,
if a function or constructor described in this section is given more
arguments than the function is specified to allow, then the function
SHALL either ignore the extra arguments, or throw a TypeError exception.
It is implementation-defined (per function) which of these alternatives
is chosen.
Rationale:
A object-capability language must be much more conservative in any additions
to libraries that are built-in or implicitly imported, since these libraries
should provide no significant authority (other than pure computation). Any
vendor-specific extensions should be made in libraries that are not
implicitly imported.
- All references to "undefined behaviour" in the specification shall be
interpreted as "implementation-dependent behaviour".
Rationale:
A specification of a safe language cannot allow undefined behaviour in
any situation. Since this is a change in the specification of run-time
behaviour, it must be changed in ESF rather than Jacaranda.
Section 16 Errors
- Replace
* An implementation may provide additional types, values, objects, properties,
and functions beyond those described in this specification. This may cause
constructs (such as looking up a variable in the global scope) to have
implementation-defined behaviour instead of throwing an error (such as
ReferenceError).
with
* An implementation may provide additional values in the global scope
beyond those described in this specification, subject to the constraints
described in sections 8.6.2 and 15. This may cause looking up a variable
in the global scope that is not described in this specification to have
implementation-defined behaviour instead of throwing an error (such as
ReferenceError).
Rationale:
The Jacaranda run-time will only rely on properties of the global scope
specified in ESF_BUILTINS. Jacaranda code will not be able to access other
properties added to the global scope.
Note that this section also includes the following quite scary provision:
* An implementation may extend program and regular expression syntax. To
permit this, all operations (such as calling eval, using a regular expression
literal, or using the Function or RegExp constructor) that are allowed to
throw SyntaxError are permitted to exhibit implementation-defined behaviour
instead of throwing SyntaxError when they encounter an implementation-defined
extension to the program or regular expression syntax.
However, we don't need to delete this provision, because:
- we are relying on the verifier to catch any code that is syntactically
invalid in ESF;
- Jacaranda code cannot call eval or use the Function constructor;
- RegExp can be tamed to verify that no regexp strings and flags use extensions.
(In practice it must be tamed anyway, because it is broken in SpiderMonkey,
as discussed in the rationale for group EXPRESSIONS.)
This text must not be interpreted to allow the behaviour of ESF extensions
to ES5 (const and "use attach";) to be implementation-defined.
Annex B Compatibility
- The additional syntax specified in section B.1 SHALL NOT be part of the
ESF grammar.
- An ESF program SHOULD NOT rely on the presence of any of the additional
properties of String specified in sections B.2.1 to B.2.6.
Lexical Grammar
===============
The lexical grammar of Jacaranda is that of ESF with the additional
constraints listed below.
FILTER_CDATA:
A Jacaranda source text MUST NOT contain any of the sequences of unescaped
code units «», or any initial subsequence of these at
the end of the text.
A Jacaranda source text MUST NOT contain the code unit '&' unless
immediately followed by a code unit from .
(If followed by '&', the same restriction applies to the next code
unit.)
+ SourceText ::
ε
| '<' AfterLT
| ']' After1RB
| '&' AfterAmp
| (CodeUnit but not '<' or ']' or '&') SourceText
+ c=AfterLT ::
ε
{ errors U= error(c, "unexpected end of text") }
| '<' AfterLT
| ']' After1RB
| '&' AfterAmp
| ('!' | '/' | '?') SourceText
{ errors U= error(c, "may be misinterpreted in HTML or XML") }
| (CodeUnit but not '<' or ']' or '&' or '!' or '/' or '?') SourceText
+ c=After1RB ::
ε
{ errors U= error(c, "unexpected end of text") }
| '<' AfterLT
| ']' After2RB
| '&' AfterAmp
| (CodeUnit but not '<' or ']' or '&') SourceText
+ c=After2RB ::
ε
{ errors U= error(c, "unexpected end of text") }
| '<' AfterLT
| ']' After2RB
| '&' AfterAmp
| '>' SourceText
{ errors U= error(c, "may be misinterpreted as end of CDATA") }
| (CodeUnit but not '<' or ']' or '&' or '>') SourceText
+ c=AfterAmp ::
ε
{ errors U= error(c, "unexpected end of text") }
| '&' AfterAmp
| AmpFollower SourceText
| ('<' AfterLT | ']' After1RB | (CodeUnit but not '<' or ']' or '&' or AmpFollower) SourceText)
{ error U= error(c, "may be misinterpreted as a character entity in HTML or XML") }
+ AmpFollower :: one of
'=' '(' '+' '-' '!' '~' '\u0027' '"' '/' '\u005C' ' ' '\u0009' '\u000D' '\u000A'
DecimalDigit
Rationale:
This allows verified Jacaranda source to be embedded (as a Unicode
string) in an XML CDATA section or an HTML
Jacaranda source may not be embedded in an HTML or XHTML _attribute_
value, even if its content is defined as CDATA, since processing of
attribute values does not preserve tab, carriage return or line feed
characters.
LEX_CODE_UNITS:
Source code units, line terminators, and whitespace MUST ONLY be used
as follows:
SourceCharacter ::
c=[\u0000-\uFFFF]
{ errors U= errorIf(c isnot ValidChar, c, "invalid code unit") }
+ ValidChar :: one of
'\u0009' '\u000A' '\u000D' // exclude ASCII controls except TAB, LF, CR.
[\u0020-\u007E] // \u007F-\u009F controls
[\u00A0-\u00AC] // \u00AD :Cf:
[\u00AE-\u05FF] // \u0600-\u0603 :Cf:
[\u0604-\u06DC] // \u06DD :Cf:
[\u06DE-\u070E] // \u070F :Cf:
[\u0710-\u17B3] // \u17B4-\u17B5 :Cf:
[\u17B6-\u200A] // \u200B-\u200F :Cf:; \u200C-\u200F !Safari !IE6
[\u2010-\u2027] // \u2028-\u2029 !Cajita; \u202A-\u202E :Cf: !Safari !IE6
[\u202F-\u205F] // \u2060-\u2064 :Cf:; \u2065-\u2069 !Cajita;
// \u206A-\u206F :Cf: !Safari !IE6
[\u2070-\uD7FF] // \uD800-\uDFFF UTF-16 surrogates
[\uE000-\uFDCF] // \uFDD0-\uFDEF noncharacters
[\uFDF0-\uFEFE] // \uFEFF (BOM) :Cf: !Safari !IE6
[\uFF00-\uFFEF] // \uFFF0-\uFFF8 !Safari !Firefox;
// \uFFF9-\uFFFB :Cf:
// \uFFFC (object replacement character)
// \uFFFD (replacement character)
// \uFFFE-\uFFFF noncharacters
+ PrintableChar ::
[\u0020-\u007E]
+ BOM ::
'\uFEFF'
c=WhiteSpace ::
'\u0009'
| '\u000B'
{ errors U= error(c, "VT character not allowed as whitespace") }
| '\u000C'
{ errors U= error(c, "FF character not allowed as whitespace") }
| '\u0020'
| '\u00A0'
{ errors U= error(c, "NBSP character not allowed as whitespace") }
| USP
{ errors U= error(c, "non-ASCII character not allowed as whitespace") }
LineTerminator ::
'\u000A'
| '\u000D'
| c='\u2028'
{ errors U= error(c, "LS not allowed as line terminator") }
| c='\u2029'
{ errors U= error(c, "PS not allowed as line terminator") }
LineTerminatorSequence :: // only used in LineContinuation
'\u000A'
| '\u000D' [lookahead ∉ '\u000A']
| '\u2028'
| '\u2029'
| '\u000D\u000A'
lc=LineContinuation ::
'\' LineTerminatorSequence
{ errors U= error(c, "line continuation not allowed") }
Note that in [ES5Draft] sections 7.2 and 7.3, both the tables and the
grammar must be changed.
This rule does not restrict the set of code units that may be
represented by \u escapes.
This rule does not affect the set of whitespace code units
used in the specification of ToNumber ([ES5Draft]
section 9.3.1).
A BOM (Byte Order Mark) code unit is treated as a token for the purpose
of lexing Jacaranda. Note that BOM is not included in the ValidChar
production, but is nevertheless valid where allowed by the syntactic
grammar.
Rationale:
Start with \u0000-\uFFFF, and
- remove code units that are not in XML 1.1's or that are in
XML 1.1's (see below)
- remove code units corresponding to BMP characters that are in
category :Cf: (Format Control) in Unicode 5.1.x:
(this is a superset of category :Cf: in all previous Unicode versions)
- remove code units corresponding to BMP characters that are in
category :Zs: (Whitespace) in Unicode 5.1.1:
other than \u0009 and \u0020.
- remove code units that are disallowed by Cajita according to
.
- remove \uFFFC and \uFFFD.
and from XML 1.1 are used just because that
is a slightly less arbitrary choice than us deciding on our own what
control characters should be restricted.
Disallowing :Cf: characters is in principle the wrong thing for proper
internationalisation support. However, we eventually concluded that
it's the only way to guarantee consistent and secure behaviour across
browsers, which in this case trumps internationalisation.
(The right thing would be to allow :Cf: characters other than \uFEFF
only in string literals and comments, and to allow \uFEFF
[byte order mark] only between tokens. Relaxing the rules later is
easier than tightening them.)
Whitespace is restricted to US-ASCII tab and space characters. There
is no reason to internationalise whitespace used to separate tokens
in source code. (Other space code units are still allowed in strings.)
Also, the fact that vertical tab and form feed were treated as
whitespace rather than as line separators was potentially confusing;
Jacaranda disallows them.
BOM tokens (and therefore BOM code units) are only allowed immediately
before a module definition -- see group MODULE_DEFINITION.
LEX_COMMENT_FIRST_CHAR:
If the first code unit in a after «/*» is not
«*» or «\u0020» or «\u000A» or «\u000D», or if the first code unit
in a after «//» is not «/» or «\u0020», then
all code units in the comment MUST be printable US-ASCII (that is,
[\u0020-\u007E]).
The necessary grammar change is given in group LEX_COMMENTS.
Rationale:
This prevents an attack relevent to adversarial code review, where
a comment that looks like «/*const*/» does not actually enforce
constness, and similarly for other comments used by JSLint, for
example.
It also prevents the first code unit in a comment from being a
non-US-ASCII code unit that is stripped on parsing, which might
plausibly result in the LEX_COMMENT_NO_ATSIGN_EXTENSIONS group
being bypassed.
LEX_COMMENT_NO_ATSIGN_EXTENSIONS:
The first code unit in any comment after «/*» or «//» MUST NOT be «@».
The necessary grammar change is given in group LEX_COMMENTS.
Rationale:
This may trigger non-ESF-compliant extensions (for example
or
).
ADsafe rejects «@cc» anywhere in a comment. This is sufficient to
prevent enabling of JScript conditional compilation, but not the
Venkman debugger extensions.
Caja and Cajita do not need to reject «@» in comments, because the
output of cajoling does not include any comments. See
.
LEX_COMMENTS:
«/*const*/» and «/*fallthru*/» are treated as tokens. (Other comments
are not tokens.) They MUST ONLY appear where specified in the Jacaranda
grammar. Note the lack of spaces before and after «const» or «fallthru».
Also note that only these exact spellings are recognised; for example,
«/*\u0063onst*/» is not recognised as a «/*const*/» token, even though
«\u0063» would represent «c» within a string literal.
A single-line comment at the end of the text MUST be terminated by a
LineTerminatorSequence.
The lexical grammar changes resulting from this and groups
LEX_COMMENT_FIRST_CHAR and LEX_COMMENT_NO_ATSIGN_EXTENSIONS are:
+ Printable ::
ε
| PrintableChar Printable
+ MultiLineNotAsteriskOrSpaceOrCROrLFOrAtChar ::
SourceCharacter but not '*' or '\u0020' or '\u000A' or '\u000D' or '@'
MultiLineCommentChars :: see [ES5Draft] section 7.4
PostAsteriskCommentChars :: see [ES5Draft] section 7.4
MultiLineNotAsteriskChar :: see [ES5Draft] section 7.4
MultiLineNotForwardSlashOrAsteriskChar :: see [ES5Draft] section 7.4
mlc=MultiLineComment ::
'/**/'
| '/**' PostAsteriskCommentChars? '*/'
| '/*' ('\u0020' | '\u000A' | '\u000D') MultiLineCommentChars? '*/'
| '/*@' MultiLineCommentChars? '*/'
{ errors U= error(mlc, "multi-line comment starts with '@'") }
| '/*' first=MultiLineNotAsteriskOrSpaceOrCROrLFOrAtChar rest=MultiLineCommentChars? '*/'
{ errors U= errorIf(first isnot Printable or rest isnot Printable, mlc,
"multi-line comment does not start with '*' or space or " ++
"newline and is not printable ASCII") }
+ SingleLineNotForwardSlashOrSpaceOrAtChar ::"
SourceCharacter but not LineTerminator or '/' or '\u0020' or '@'
SingleLineCommentChars :: see [ES5Draft] section 7.4
SingleLineCommentChar :: see [ES5Draft] section 7.4
+ SingleLineCommentEnd ::
LineTerminator
| eot=EndOfText
{ errors U= error(eot, "single-line comment at end of text must end with a " ++
"line terminator") }
slc=SingleLineComment ::
'//' SingleLineCommentEnd
| '///' SingleLineCommentChars? SingleLineCommentEnd
| '//\u0020' SingleLineCommentChars? SingleLineCommentEnd
| '//@' SingleLineCommentChars? SingleLineCommentEnd
{ errors U= error(slc, "single-line comment starts with '@'") }
| '//' first=SingleLineNotForwardSlashOrSpaceOrAtChar rest=SingleLineCommentChars SingleLineCommentEnd
{ errors U= errorIf(first isnot Printable or rest isnot Printable, slc,
"single-line comment does not start with '/' or space " ++
"and is not printable ASCII") }
Rationale:
/*const*/ and /*fallthru*/ must be ES3 or ES5 comments in order
for Jacaranda to remain a sublanguage of ESF. (They are lexed as
multiline comments, but should then be retained as tokens, while
other comment tokens are thrown away.)
See group ASSIGNABLE_EXPRESSIONS for a description of the use of /*const*/
(it is essentially the same as 'const' as implemented in some
current browsers and proposed for ES-Harmony, but with static
rejection instead of silent failure).
A SingleLineComment must be terminated by a LineTerminatorSequence, to avoid
the possibility that any end-of-CDATA marker or end-tag added when
embedding verified source might be treated as part of a comment:
(It should not be; we're just being paranoid.)
LEX_NUMERIC_LITERALS:
A MUST NOT have more then 20 significant digits.
As specified in [ES5Draft] section 7.8.3, the source code unit
immediately following a MUST NOT be an
or .
The following attribute definitions are used to compute the value of
a numeric literal, which is needed for other rules. They correspond
exactly to the procedure in [ES5Draft] section 7.8.3, expressed in
the same attribute grammar notion as the rest of this specification.
nl=NumericLiteral ::
| dl=DecimalLiteralNoExp (AfterNumeric but not ExponentIndicator or DecimalDigit)?
{ MV = dl.MV;
SV = NumberToString(MV) }
| dl=DecimalLiteralNoExp ExponentIndicator ep=SignedInteger (AfterNumeric but not DecimalDigit)?
{ MV = dl.MV * 10^(ep.MV);
SV = NumberToString(MV) }
| '0' HexIndicator hi=HexInteger (AfterNumeric but not HexDigit)?
{ MV = hi.MV;
SV = NumberToString(MV) }
+ AfterNumeric ::
dd=DecimalDigit
{ errors U= error(dd, "disallowed DecimalDigit character after numeric literal") }
| is=IdentifierStart
{ errors U= error(is, "disallowed IdentifierStart character after numeric literal") }
+ dl=DecimalLiteralNoExp ::
dil=DecimalIntegerLiteral '.'?
{ MV = dil.MV;
errors U= errorIf(dil.significant > 20, dl,
"> 20 significant digits in decimal literal") }
| dil=DecimalIntegerLiteral '.' dd=DecimalDigits
{ if (match (dil ++ dd) as digits=DecimalDigits) {
MV = dil.MV + dd.MV*10^-(dd.n);
errors U= errorIf(digits.significant > 20, dl,
"> 20 significant digits in decimal literal")
} else {
errors U= error(dl, "cannot happen")
} }
| '.' dd=DecimalDigits
{ MV = dd.MV*10^-(dd.n);
errors U= errorIf(dd.significant > 20, dl,
"> 20 significant digits in decimal literal") }
DecimalIntegerLiteral ::
d=DecimalDigit
{ MV = d.MV;
n = 1 }
| d=NonZeroDigit dd=DecimalDigits
{ MV = d*10^(dd.n) + dd.MV;
n = dd.n + 1 }
DecimalDigits ::
'0'
{ MV = 0;
significant = 0;
n = 1 }
| d=NonZeroDigit
{ MV = d.MV;
significant = 1;
n = 1 }
| '0' rest=DecimalDigits
{ MV = rest.MV;
significant = rest.significant;
n = rest.n + 1 }
| d=NonZeroDigit rest=DecimalDigits
{ MV = d.MV*10^(rest.n) + rest.MV;
significant = rest.significant + 1;
n = rest.n + 1 }
DecimalDigit ::
'0' { MV = 0 }
| NonZeroDigit
NonZeroDigit ::
'1' { MV = 1 }
| '2' { MV = 2 }
| '3' { MV = 3 }
| '4' { MV = 4 }
| '5' { MV = 5 }
| '6' { MV = 6 }
| '7' { MV = 7 }
| '8' { MV = 8 }
| '9' { MV = 9 }
ExponentIndicator :: one of
'e' 'E'
SignedInteger ::
DecimalDigits
| '+' dd=DecimalDigits
{ MV = dd.MV }
| '-' dd=DecimalDigits
{ MV = -(dd.MV) }
+ HexIndicator :: one of
'x' 'X'
+ HexInteger ::
hd=HexDigit
{ MV = hd.MV }
| hi=HexInteger hd=HexDigit
{ MV = hi.MV*16 + hd.MV }
HexDigit : see group LEX_STRING_LITERALS
Rationale:
[ES5Draft] section 7.8.3 says that:
Once the exact MV for a numeric literal has been determined, it is then
rounded to a value of the Number type. If the MV is 0, then the rounded
value is +0; otherwise, the rounded value must be /the/ number value for
the MV (in the sense defined in 8.5), unless the literal is a
and the literal has more than 20 significant digits, in
which case the number value may be either the number value for the MV
of a literal produced by replacing each significant digit after the 20th
with a 0 digit or the number value for the MV of a literal produced by
replacing each significant digit after the 20th with a 0 digit and then
incrementing the literal at the 20th significant digit position.
Therefore decimal literals with more than 20 significant digits cannot
be used to obtain any additional precision.
(Note that 20 significant digits are more than sufficient to round-trip any
IEEE double value.)
This restriction is also necessary in order to enforce that properties in
object literals are unique. The issue here is that if the conversion to an
IEEE double is nondeterministic, then the verifier may fail to detect names
that should be treated as colliding in group OBJECT_LITERALS.
LEX_STRING_LITERALS:
A MUST ONLY use one of the escape characters
«b», «t», «n», «f», «r», double-quote, single-quote, backslash, or «/».
A string token MUST NOT contain a .
The following attribution rules are used to compute the
value of a string literal, which is needed for other rules. They
correspond exactly to the procedure in [ES5Draft] section 7.8.4,
expressed in the same attribute grammar notion as the rest of
this specification.
The grammar has been altered to not require lookahead in the
production.
StringLiteral ::
'"' ds=DoubleStringCharacters '"'
{ SV = ds.SV }
| '\u0027' ss=SingleStringCharacters '\u0027'
{ SV = ss.SV }
DoubleStringCharacters ::
ε
{ SV = [] }
| c=DoubleStringCharacter rest=DoubleStringCharacters
{ SV = [c.CV] ++ rest.SV }
| '\u005C' '0' rest=NIDDoubleStringCharacters // no initial decimal
{ SV = ['\u0000'] ++ rest.SV }
+ NIDDoubleStringCharacters ::
ε
{ SV = [] }
| c=NIDDoubleStringCharacter rest=DoubleStringCharacters
{ SV = [c.CV] ++ rest.SV }
| '\u005C' '0' rest=NIDDoubleStringCharacters
{ SV = ['\u0000'] ++ rest.SV }
DoubleStringCharacter ::
c=SourceCharacter but not '"' or '\u005C' or LineTerminator
{ CV = c }
| '\u005C' e=EscapeSequence
{ CV = e.CV }
| LineContinuation
+ NIDDoubleStringCharacter ::
c=SourceCharacter but not '"' or '\u005C' or LineTerminator or DecimalDigit
{ CV = c }
| '\u005C' e=EscapeSequence
{ CV = e.CV }
| LineContinuation
SingleStringCharacters ::
ε
{ SV = [] }
| c=SingleStringCharacter rest=SingleStringCharacters
{ SV = [c.CV] ++ rest.SV }
| '\u005C' '0' rest=NIDSingleStringCharacters
{ SV = ['\u0000'] ++ rest.SV }
+ NIDSingleStringCharacters ::
ε
{ SV = [] }
| c=NIDSingleStringCharacter rest=SingleStringCharacters
{ SV = [c.CV] ++ rest.SV }
| '\u005C' '0' rest=NIDSingleStringCharacters
{ SV = ['\u0000'] ++ rest.SV }
SingleStringCharacter ::
c=SourceCharacter but not '\u0027' or '\u005C' or LineTerminator
{ CV = c }
| '\u005C' e=EscapeSequence
{ CV = e.CV }
| LineContinuation
+ NIDSingleStringCharacter ::
c=SourceCharacter but not '\u0027' or '\u005C' or LineTerminator or DecimalDigit
{ CV = c }
| '\u005C' e=EscapeSequence
{ CV = e.CV }
| LineContinuation
LineContinuation ::
'\u005C' e=LineTerminatorSequence
{ errors U= error(e, "line continuation not allowed") }
EscapeSequence ::
CharacterEscapeSequence
| HexEscapeSequence
| UnicodeEscapeSequence
CharacterEscapeSequence ::
SingleEscapeCharacter
| NonEscapeCharacter
se=SingleEscapeCharacter ::
'b' { CV = '\u0008' }
| 't' { CV = '\u0009' }
| 'n' { CV = '\u000A' }
| 'f' { CV = '\u000C' }
| 'r' { CV = '\u000D' }
| '"' { CV = '\u0022' }
| '\u0027' { CV = '\u0027' } // '
| '\u005C' { CV = '\u005C' } // \
| 'v' { CV = '\uerrors U= error(se, "\\v escape") }
ne=NonEscapeCharacter ::
'/' { CV = '\u002F' } // explicitly allowed by JSON
| SourceCharacter but not '/'
{ errors U= error(ne, "unrecognised escape") }
HexEscapeSequence =
'x' a=HexDigit b=HexDigit
{ CV = [a.MV*16 + b.MV] }
UnicodeEscapeSequence ::
'u' a=HexDigit b=HexDigit c=HexDigit d=HexDigit
{ CV = [a.MV*4096 + b.MV*256 + c.MV*16 + d.MV] }
HexDigit ::
'0' { MV = 0 }
| '1' { MV = 1 }
| '2' { MV = 2 }
| '3' { MV = 3 }
| '4' { MV = 4 }
| '5' { MV = 5 }
| '6' { MV = 6 }
| '7' { MV = 7 }
| '8' { MV = 8 }
| '9' { MV = 9 }
| 'A' { MV = 10 }
| 'B' { MV = 11 }
| 'C' { MV = 12 }
| 'D' { MV = 13 }
| 'E' { MV = 14 }
| 'F' { MV = 15 }
| 'a' { MV = 10 }
| 'b' { MV = 11 }
| 'c' { MV = 12 }
| 'd' { MV = 13 }
| 'e' { MV = 14 }
| 'f' { MV = 15 }
Rationale:
JScript does not implement «\v» (it treats it as «v»). This could allow
lexical confusion attacks. Unlike Caja, we cannot rewrite it to a different
escape.
There is no reason to allow «\» followed by an escape character that
does not have a defined meaning. However, we must allow «\/» in order
to remain a superset of JSON. (Unfortunately, JSON does not allow
«\'», otherwise JSON and Jacaranda would support identical syntax for
double-quoted strings. We considered disallowing «\'» in double-quoted
strings, but that would create an inconsistency between single- and
double-quoting.)
Line continuations cannot be allowed, even if 'ES5' ∈ Options,
because an ES3 parser may lex a module definition incorrectly if they
are used.
LEX_NO_REGEXP_LITERALS:
Regular expression literals MUST NOT be used.
Rationale:
ES3 requires that each regexp literal in a program share a single
RegExp object. However RegExp objects are mutable, which makes this
sharing error-prone, and provides implicit communication channels
that should not be present in an object-capability language.
In ES5, regexp literals create a new RegExp object each time they
are evaluated, but the fact that RegExp objects are mutable is still
undesirable, and ConstRegExp SHOULD be used in preference. To get
an idea of how often the mutability of RegExp objects trips up
developers, see the huge number of duplicate bugs for
(many of which
concern mutability in general, not the issue that is fixed in ES5).
Removing mutable RegExp objects entirely would unfortunately
break too much code.
Supporting regexp literals, however, is not feasible without
complicating lexing enormously and incurring unacceptable security
risks. A JavaScript lexer must be implemented using feedback from
the parser, because whether a «/» character (not in a comment) is
lexed as part of a or a
depends on whether a is syntactically allowed at
that position.
In Jacaranda, it would have been possible to distinguish such positions
by using a finite state machine in the lexer, and this approach was
seriously considered for Jacaranda draft 0.4. However, it was abandoned
when it became clear (as a consequence of a bug in JScript.Net), that
this would make Jacaranda highly vulnerable to a complete break in the
case of any noncompliance in a JavaScript implementation's lexer.
In more detail, the bug in the JScript.Net lexer is that «/=» is
incorrectly tokenized as a division-assignment even in contexts
where a regexp literal can occur:
An exploit for this bug could have been of the form:
try {/=} catch (e) {toast();} try{1/+1;} catch (e2) {}
except that JScript.Net is a fully static compiler, so syntax errors
cause failure of compilation rather than a run-time exception.
Although this specific bug in JScript.Net would not have been exploitable,
the risk of similar unknown bugs in other implementations, given the
complexity of distinguishing and
correctly, was considered to be too great.
The ANTLR-based implementation of Jacaranda does lex regular expression
literals (using feedback from the parser to the lexer), but it does so
only after signalling a verification error, so that no security failure
can occur even if the verifier's lexing would differ from that of an
implementation under which the code might have been run.
LEX_SEMICOLON_INSERTION:
Section 7.9.1 of [ES5Draft] describes the rules by which semicolons are
implicitly inserted into ES5 programs under certain conditions that would
otherwise be a syntax error. In Jacaranda, those conditions are made more
restrictive, _as though_ the following *grammatical semicolon insertion*
rule were in effect:
A *nonsubstatement* is a statement that is not a substatement of a
, , , ,
, or ; and is not the last statement of
a case or default clause of a .
When, as the program is parsed from left to right, a «}» token is
encountered where the «;» token that would terminate a nonsubstatement
is expected, then parsing occurs as-if by inserting a «;» token
before the «}».
The effect of this rule is actually specified by modifications to the
Jacaranda grammar, which remains context-free.
In addition,
* A postfix «++» or «--» operator MUST be on the same line as the last
token of its operand;
* If a continue, break, return or throw statement takes an operand
(label or expression), then the keyword MUST be on the same line
as the first token of its operand.
Rationale:
First read .
It is obvious that we do not want "syntactic semicolon insertion" [SSI below] --
it is too difficult to specify and too difficult for programmers to understand
its consequences. But is the above wiki page correct in asserting that
"grammatical semicolon insertion" [GSI below] is "relatively harmless"?
The wiki page oversimplifies how GSI could be specified or implemented:
it is not sufficient to simply make semicolons optional in statement
productions, since they are not optional when followed by another statement.
Nevertheless, it is feasible to specify this in a context-free grammar.
GSI is almost equivalent to treating ';' as a statement _separator_.
The assertion that GSI in this sense is harmless is probably not entirely
correct. In
The Impact of Language Design on the Production of Reliable Software
an experiment was performed comparing two languages -- TOPPS, which used
semicolons as separators, and a redesigned language TOPPSII, which used
semicolons as terminators (as well as having many other changes). The
results were interpreted as follows:
In using the semicolon as a separator, rather than as a statement terminator,
TOPPS was following a long and honorable tradition (ALGOL 60, Pascal, BLISS,
etc.). However, the TOPPSII form (similar to that of PL/I) led to an order
of magnitude reduction in the number of semicolon errors. Of course, most
semicolon errors are rather trivial (i.e., they generally do not persist
more than one run). However, a small modification to the language would have
eliminated errors that occurred on more than a quarter of all compilations.
It is also interesting to note that over 14% of the second-half TOPPS errors
are still semicolon errors (compared to 1% for TOPPSII).
On the other hand, this study considered programmers who had previously only
used PL/I (which as stated, uses semicolons as terminators), and had only very
limited programming experience. It is not clear that its conclusions are relevant
to JavaScript -- not least because TOPPS prohibited trailing semicolons in a
block rather than making them optional. We consider the benefit of compatibility
with a slightly larger set of JavaScript programs to outweigh the potential for
additional errors in this case.
Jacaranda does not make semicolons optional when they terminate a substatement,
for example the semicolons after both «return» statements in this example are
required:
(function (foo) { if (foo) return 1; else return 2; })();
This is to simplify the grammar, and because omitting the semicolon would
obscure the fact that the substatement is a separate statement to its parent.
Due to a bug in most JavaScript implementations other than Opera, the
behaviour of programs such as the following is not conformant to ES3 or ES5
on those implementations:
(function () { a: for (;;) { for (;;) { break/*
*/a } return 1 } return 2 })();
ES3 requires the comment that is split over lines to be treated as a
line terminator, so that «break [LineTerminator] a;» will not be
syntactically valid, and a semicolon will be inserted after «break».
The buggy implementations do not perform this insertion, instead
interpreting the statement as «break a;» and returning 2.
Jacaranda is not vulnerable to this issue, because of the rules requiring
postfix «++» or «--» to be on the same line as the last token of its operand,
and the first token of the operand (if present) of a «continue», «break»,
«return» or «throw» statement to be on the same line as the keyword. The
example above, for instance, is not valid because «a» is not on the same
line as «break».
Module Definitions
==================
MODULE_DEFINITION:
A module definition MUST be lexable according to the Jacaranda lexical
grammar; then, the resulting token stream MUST be parseable as the
production below.
The *directive prologue* of a is the longest
initial sequence of in its body that consist of
whose expression is a .
The passed as an argument to $module MUST NOT be
named, and its directive prologue MUST contain the directives
"use strict" and "use jacaranda". If 'attach' ∈ Options then
the directive prologue MUST also contain the directive "use attach".
[# Design issue: it would be nice for the runtime to be able to
detect the "use attach" directive, so that we wouldn't need the
'attach' option. #]
Every freely-used identifier of a module (as defined by the
SCOPE_OF_IDENTIFIERS group) MUST occur in IMPLICIT_IMPORTS.
A module MUST NOT have any freely-assigned names.
[# FIXME: directive prologues not implemented correctly in grammar. #]
[# FIXME: need to be closer to implementation here. #]
program :
BOM* programElements?
+ programElements
module BOM* programElements?
| [lookahead ... {'$module'}] sourceElements
sl=statementList
{ if (match sl as mdef=moduleDefinition) {
errors = mdef.errors
} else {
if (match "function() {" ++ sl.text ++ "}" as f=functionExpression) {
errors = error(sl, "not a module definition")
U f.errors
U errorIf('this' ∈ f.freelyUsedNames, sl, "'this' used freely at top-level") }
} else {
errors = error(sl, "not a module definition, and parsing as a function body did not work (please report this)")
}
} }
+ module :
'$module' '(' name=StringLiteral ',' ol=optionsList f=function ')' ';'
{ Options = ol.options;
directives = {};
errors U= errorIf('use strict' ∉ f.directives, f,
"module function has no \"use strict\" directive")
U errorIf('use jacaranda' ∉ f.directives, f,
"module function has no \"use jacaranda\" directive")
U errorIf('attach' ∈ Options and 'use attach' ∉ f.directives, f,
"attach option specified but module function has no \"use attach\" directive")
U errorIf(f.looseNames != {}, f, "module function must not be named")
U { err(f, "freely-used identifier "++quote(id)++
" (not in implicit imports) at module level")
| id <- f.freelyUsedNames \ IMPLICIT_IMPORTS }
U { err(f, "freely-assigned identifier "++quote(id)++" at module level")
| id <- f.freelyAssignedNames } }
+ optionsList :
ε
| opt=StringLiteral ',' list=optionsList
{ let strip('!' ++ S) = S,
strip(_) = _;
options = strip(opt.SV) U list.options;
errors U= errorIf(opt.SV startswith '!' and opt.SV != '!ES5' and opt.SV != '!attach',
opt, "important option is not recognised") }
+ BOM :: see group LEX_CODE_UNITS
IMPLICIT_IMPORTS =
ESF_BUILTINS U
functionExpression : see group FUNCTIONS
Rationale:
The format of a module definition has been redesigned for Jacaranda 0.4x.
The imports of a module are now specified as parameters to the module
function. (In Jacaranda 0.3, a 'with' statement was used to inject bindings
into the scope of the module.)
Because ES5 does not support keyword parameters, the Jacaranda runtime uses
a hack involving parsing the result of 'toString()' on the module function
to find its parameter names; it then maps the corresponding properties of
the imports object to the function arguments in the correct order. There
will be a runtime exception if this results in access to a protected property
of the imports object.
This approach simplifies the specification by avoiding the need to compute
the imports from the module object literal; in fact, there is no module literal
any more (eliminating all of the 'sef' grammar productions from Jacaranda 0.3),
nor any concept of a "module object". It also makes module definitions prettier
and less obscure, and allows the import names to be unquoted.
To maintain some of the flexibility of the approach that used a module literal,
it is possible to pass an arbitrary sequence of "options", specified as string
literals, to $module.
Because occurrences of explicit imports are now bound occurrences, the
remaining freely-used names of a module function must all be implicit
imports. Since the set of implicit imports is fixed for a given version
of Jacaranda, it is possible to run the module code in a lexical scope that
provides bindings for all of these imports.
Requiring the imports to be declared also has software engineering advantages,
and could possibly detect some errors due to misspelled words. In these
respects it is similar to a /*globals*/ declaration in JSLint.
Note that future versions of Jacaranda may add new implicit imports. In that
case, a module that was verified by the later version's verifier should not
be run under the previous version's runtime, since access to the added imports
will not be blocked. (The approach used in Jacaranda 0.3 also had this
deficiency.)
There is no longer any concept of optional imports. However, if a module
has a parameter named "imports", it will be mapped to the imports object,
enabling the use of $get(imports, 'foo') and $hasPublic(imports, 'foo').
Imports can be directly used, but not directly assigned to. In the case of
explicit imports, this restriction is now an automatic consequence of
declaring these imports as parameters, which are const in Jacaranda. This
ensures that there is no implicit per-module or per-caplet mutable state.
(Explicit per-caplet mutable state can be declared by using «var» in
some outer function.) The main rationale for this restriction is to
allow confinement at the granularity of groups of objects, rather
than only between module instances (see the message quoted at
).
Note that the module function expression must be anonymous, since
a named function expression might have the side effect of defining
the name in the surrounding scope. (Since Jacaranda 0.41, it is no
longer required that all function expressions are anonymous.)
One way to submit a module to a Jacaranda interpreter is to simply
execute the module definition as ESF code -- for example in a