The Terminal Declaration Part

Tokens, or terminal symbols, are defined in the first part of each grammar definition file. These terminals are defined using a regular expression meta-language, but it's also possible to give each terminal symbol an individual name and an individual code segment to be executed when this token is recognized by the generated lexical analyzer (e.g., to cut the leading and trailing quotation marks if a string is recognized from the token's attribute).

The general syntax to define tokens is:

                  regular-expression   label          code
                         ·
                         ·
                         ·
                  ;

where label and code are optional.

The regular expression is specified in the ways described in Regular Expressions, using a single- or double-quoted string.

A token's label is defined as a single-word identifier, e.g. FLOAT or INTEGER_NUMBER. Not allowed are separated words. If no label is specified, JS/CC uses the regular-expression definition itself as the label, but without taking escape-characters, so the regular expression '\+' will result in the label '+', as in the above example. If '+' itself is specified as a regular expression, a parse error will occur because the plus-character is the symbol for a positive closure in regular expressions.

A semantic code action is defined by enclosing the desired JavaScript code segment with a [* and *] symbol. If more than one code segment is specified in a row, all segments are summarized to one segment to be attached to the terminal symbol. To simply access things like the matched pattern, the offset where the pattern starts, or the source string in these individual code segments, the wild cards %match, %offset, and %source should be used. These wild cards are later substituted by the particular variable names in the resulting lexical analyzer.

Because JS/CC also allows passing precedence and associativity information to tokens or token groups, each block of token definitions is closed by a semicolon (;). Because of that, the semicolon is set behind the last token definition also in the above example, even if we don't use any precedence information here.