Grammars

For generating parsers and lexical analyzers JetPAG needs grammars. Grammars define rules and properties which JetPAG uses for analysis, and generates source code that imitates a program which would parse the defined grammars.

A standard grammar file looks like this:

grammar grammar-name:

parser parser-name:
	...

scanner scanner-name:
	...

In JetPAG grammars a name must be specified for the whole grammar, It'll mostly be used for namespace stuff. Both scanner and parser specifications are optional, and their order doesn't matter. If a parser grammar is specified, it must utilize the available set of token types (from the scanner specificated, an external file or both).

Because scanners generated by JetPAG can operate on input stream any type, character literals are treated as words (integer type int). JetPAG operates on literals (both characters and token IDs) in the range of zero to maximum value for words minus one (INT_MAX - 1). Literals outside of this range are only used internally and can't be represented, except for end-of-stream (EndOfStream). For example, this inversion:

~'a'-'z'

Matches the set {0..96, 123..(231-2)} on 32-bit machines. JetPAG generates sets in one of three forms: interval sets, bitsets and compaaact switch-case sets. JetPAG might also generate inverted forms if they're more efficient, or even might not generate sets at all and generate a code that uses one of the several fast API functions optimal for limited number of sets. This makes a total of up to 7 possible ways for generating a set, where the proper way is determined by analyzing different properties of the set like its size and the order of its members. For the set above JetPAG won't generate a set and would generates the code:

matchRi('a', 'z');

This code uses the API function matchRi(int, int) which is optimal for inverted wide ranges.

One special literal is the one that comes at the end of input streams. Recognizers generated by JetPAG may work on an arbitrary number of stream managers, so using a single method for ending an input stream isn't enough. One important note regarding lexical analyzers is that when an end of stream is encountered in nextToken(), the lexical analyzer will automatically move to the stream manager below in the stack. To match the end of the current stream and move to the stream manager below in the stack, if there is, use the keyword EndOfStream. To match the end of all the streams in the stack use the keyword EndOfAllStreams.

Generated modules are all enclosed in a common namespace which takes the grammar's name. JetPAG offers further nesting this namespace in more namespaces with the namespaces multi-value option. For example:

grammar MyGrammar< namespaces=Outer Inner >:

When generating files, JetPAG uses the defaul naming scheme where each unit's files take its name. This can be overriden with the output_file option, available for the grammar, the scanner specification and the parser specification. This option takes a string value which is suffixied with either .hpp or .cpp when generating a certain module, depending on the output file. For example

grammar G:

scanner Lexer< output_file="G_Lexer" >:
...