Tokens, or terminal symbols, are defined in the first part of each grammar definition file. These terminals are defined using a regular expression meta-language, but it's also possible to give each terminal symbol an individual name and an individual code segment to be executed when this token is recognized by the generated lexical analyzer (e.g., to cut the leading and trailing quotation marks if a string is recognized from the token's attribute).
The general syntax to define tokens is:
regular-expression label code · · · ;
where label and code are optional.
The regular expression is specified in the ways described in Regular Expressions, using a single- or double-quoted string.
A token's label is defined as a single-word identifier, e.g.
allowed are separated words. If no label is specified, JS/CC uses the regular-expression definition itself as the
label, but without taking escape-characters, so the regular expression
'\+' will result in the label
'+', as in the above example. If
'+' itself is specified as a regular expression, a parse
error will occur because the plus-character is the symbol for a positive closure in regular expressions.
*] symbol. If more than one code segment is specified in a row, all segments are summarized to one
segment to be attached to the terminal symbol. To simply access things like the matched pattern, the offset where
the pattern starts, or the source string in these individual code segments, the wild cards
%source should be used. These wild cards are later substituted by the
particular variable names in the resulting lexical analyzer.
Because JS/CC also allows passing precedence and associativity information to tokens or token groups, each block
of token definitions is closed by a semicolon (
;). Because of that, the semicolon is set behind
the last token definition also in the above example, even if we don't use any precedence information here.