The symbols and operators to be used within JS/CC's own regular-expression language are summarized in the following table. They form a minimal implementation of a regular-expression engine.
|Character||One character specifies exactly that character. If a regular-expression operator like
||One character, defined via ASCII-code., e.g.,
||Escaped character. Must be used when a character of the meta-language itself should be
||Any character (character class matching all available characters).|
||Character-class. If a beginning circumflex (
||Or-operator. Allows to specify different expressions at one level.|
||Kleene-closure operator (none or many), to be specified behind a character, character-class, or sub-expression.|
||Positive-closure operator (one or many), to be specified behind a character, character-class, or sub-expression.|
||Optional-closure operator (one or none), to be specified behind a character, character-class, or sub-expression.|
To allow case-insensitive keywords within grammar definitions, a terminal symbol definition can be
specified using single-quoted (
'…') and double-quoted (
strings. A single-quoted string means that a terminal symbol is matched case-sensitive, while a
double-quoted string matches a terminal in any case order. For example, the terminal symbol
"PRINT" will match for Print, print, PrINT, and
PRINT, while the definition
'PRINT' will only match for PRINT itself.
From these regular expression definitions, JS/CC constructs a deterministic finite automaton which acts as lexer in the resulting parser.
If there are ambiguous regular expressions (where several expressions match the same string) within the terminal definition part, the expressions defined first in the terminal definition part will take higher match precedence than the later-defined terminals. It is recommended to define tokens with a higher specialization level as the first, and tokens with a lower level as the last in your token definition part.
Tokens can be grouped by precedence levels and associativity. This feature allows writing faster and even smaller grammars, by resolving grammar conflicts by weighting terminal symbols.
A group without a group specifier will set no associativity and a precedence level of zero to all terminal symbols in this group (as in the first example).
Else, if a group begins with the symbol
< for left-associativity,
^ for non-associativity, all terminal symbols within this group are set to
the according associativity and precedence level. The precedence level is incremented each time a new group
of these three types is opened, so groups that are defined at the bottom of the token definition part take the
The precedence information as associativity is used to resolve conflicts in ambiguous grammars by modifying the parse table's natural content. How this works in practice is described in The Grammar Definition Part in the section dealing with grammar conflicts and their handling.
A special type of terminal symbol is introduced by the exclamation-mark (
!) symbol: the whitespace
In this definition, there is only a regular expression possible. A label or code part is prohibited. As whitespace-tokens, terminals that should always be ignored can be specified, e.g., blanks, tabs, or comments.