Recgonizers

A recognizer is defined as an interpreter for a specific type of input streams. The Underlying semantics of all recognizers in JetPAG are very similar: they all operate on some form of input stream with similar common actions: consume and seek, they all use none-capturing lookahead and so on. All recognizers in JetPAG inherit from the template base type jetpag::recognizer, which defines common semantics such as error reporting and stream lookahead.

For scanners input streams are physical program inputs such as files, network streams and the standard input. The reason for scanners is to separate splitting input from interpreting it, and to filter unwanted input like whitespaces and comments. Scanners pass only sigificant sections of the input and split them into tokens, acting as an abstraction layer between parsers and real program input.

Recognizers do not interact directly with streams, they rather do that indirectly through stream managers. Stream managers take care of managing streams and buffers for lookahead. A single recognizer may operate with several idependent stream managers, where different stream managers do not necessarily interact with different streams. This enables, for example, a scanner to provide a token stream based on more than one program input, useful when a lanuguage allows files to inlude each like C/C++, PHP and so on.

By default, recognizers generated by JetPAG only inherit from the essential base types (jetpag::basic_scanner or jetpag::basic_parser). JetPAG allows adding more base types via the inherits keyword, following the name of the recognizer and followed by a comma-separated list of the base types. For example this defines a scanner S that derives from MyBase:

scanner S inherits {% public MyBase %}:

This is useful if a parser needs to use non-public methods of other types. To complement this feature, JetPAG offers customized constructors and destrcutors. Default constructors for recognizers in JetPAG take only one argument which is the stream of the recognizer. So for a scanner S and a parser P the default generated constructors would be:

S::S(std::istream &);
P::P(Scanner &);

Such constructors only have a base initializer. This isn't practical if member or additional base initializers are needed. Also destructors are not generated for recognizers, which also is not practical if a recognizer uses resources which have to be freed. For coping with these needs custom constructors and destructors, identified by directives #ctor and #dtor respectively, may be used. A custom constructor definition is much like a normal C++ constructor's:

#ctor (int x)
: _member1(x)
{
  $$ // Embed source in constructor's body
}

The above grammar defines a constructor that takes an additional integer argument, and has an additional member initializer. Arguments, initializers and embedded source are all optional. Note that they do not replace the original ones but rather add to them, so the code generated constructor would look like:

S::S(std::istream &, int);

Multiple constructors may be defined, if no constructors are defined the default one will be generated. Destrcutor directives differ that only one destructor directive may be defined for a recognizer as follows:

#dtor ( )
{
  $$ Embed source in destructor's body
}

A virtual destructor will be generated containing the actions.

Actually JetPAG allows embedding source code in nearly any region of the output files. For this purpose many directives are provided. Directive for embedding source code come before the grammar rules, all of them are optional and must follow this order:

#head_top_over*Embedded into the top of the header file
#head_top_below*Embedded after the #includes in the header file
#type_overEmbedded before the recognizer is declared
#type_subEmbedded as a top member of the recognizer type
#ctorConstructors
#dtorDestructor
#type_belowEmbedded after the recognizer type is declared
#head_bottom*Embedded at the bottom of the header file
#impl_top_over*Embedded into the top of the implementation file
#impl_top_below*Embedded after the #includes in the implementation file
#impl_nsEmbedded as the first member in the namespace in the implementation file
#impl_bottom*Embedded at the bottom of the implementation file

Note that that directives marked with * embed source outside the namespace of the generated recognizer.

The whole generated program has an essential parameter: type of characters in stream used for program input. By default char type is char, and these types are used:

basic_scanner<char>
basic_charStreamManager<char>
basic_parser<char>
basic_tokenStreamManager<char>
basic_token<char>
std::istream<char>

Only compatible types might be used, this is guarded by C++'s type safety mechanisms. If you want use input stream other than char, like wchar_t, you can set this with the char_type grammar option:

grammar MyGrammar < char_type="wchar_t" > :