Getting Started

In this tutorial, we'll go over the basic items in a JetPAG grammar file. We'll start with a simple parser that parses mathematical expressions, and discuss the basic elements in a JetPAG grammar. Later we'll go little deeper and discover other features in JetPAG that allow additional control over the generated recognizers.

Note: JetPAG and the framework are both written in C++, so you'r expected to have a C++ compiler for using JetPAG and generated recognizers.

In order for the generated recognizers to compile, the directory src/jpfiles where you've extracted the downloaded archive must be present aside. The best way to do this is to create a new directory, let's say work, move the compiled JetPAG executable to there and copy the src/jpfiles directory there. From inside that new directory you use jetpag and compile the generated files normally. During this tutorial we assume you're working from inside this directory.

To start, open your favorite text editor and create a new file, let's say calc.jg. Past this text to the file and save it.

calc.jg
grammar CalcG:

parser CalcParser:

expression:
 additive
 ;

additive:
 factor
 ( '+' factor
 | '-' factor
  )*
 ;

factor:
   INT
 | '(' additive ')'
 ;

scanner CalcLexer:

INT:   '0'-'9'+;
LP:    '(';
RP:    ')';
PLUS:  '+';
MINUS: '-';
ENDC:  '.';
skip WS: !' '+;

Analysis: The specification above defines a parser for one-line mathematical expressions with addition and subtraction. Each grammar rule is specified starting with it's name, followed by a colon, followed by the syntax grammar and closed with a semi-colon (much more stuff can be specified for a rule, some discussed here).

In scanner grammars character literals ('(', ')', '+' ...) mean mathing the specified single characters. Character ranges ( '0'-'9' ) mean matching any character between, an including, the boundaries. the plus in ( '0'-'9'+ ) means mathing the range '0'-'9' one or more times. The modifier skip in ( skip WS ) means that this is an internally skipped rule, it won't directly contribute to the token stream fed by the lexical analyzer. The rule WS skips spaces in input, and when it's done the lexical analyzer procedes with next token. The exclamation mark (!) is the skip operator tells JetPAG that the text matched by the operand (the space ' ' in this example) should not be recorded and won't appear in the constructed token's textual value.

In parser grammars a name of a scanner rule ( ENDC, PLUS, MINUS ... ) means matching a token in the token stream with the same TTID as that of the specified rule. Note that simple scanner rules like punctuation marks and keywords can be referenced in parser grammars directly by specifying their textual values. The vertical bar ( | ) defines an alternative block with several alternations, it is a low precedence operator that means matching one of the alternatives separated by it. The star in the rule additive means matching the enclosed alternative zero or more times.

We need JetPAG to generate a working parser kit from the specification above; run this command:

$ ./jetpag calc.jg

After running the command, JetPAG generates several C++ source files:

CalcG_ttypes.hpp  Data for token types
CalcG_ttypescpp   Complements CalcG_ttypes.hpp
CalcLexer.hpp     Lexical analyzer declaration
CalcLexer.cpp     Lexical analyzer definition
CalcParser.hpp    Parser declaration
CalcParser.cpp    Parser definition

You might be wonderng what's going on. Open the file CalcParser.cpp, and find the definition of the rule additive. It should look like this:

CalcParser.cpp
void
CalcParser::
match__additive()
{
	// Grammar
	match__factor();
	for (; ; ) // (..)*
	{
		switch (ncla())
		{
		case TType::PLUS:
		{
			advance(); // '+'
			match__factor();
		}
		break;
		case TType::MINUS:
		{
			advance(); // '-'
			match__factor();
		}
		break;
		default:
		{
			goto __loop_end_0;
		}
		}
		continue;
		__loop_end_0: break;
	}
}// CalcParser::additive

The function explains itself. It first matches a factor, then as long as PLUS or a MINUS token follows it parses addition/subtration.

That's it! JetPAG has finished it's part of the job. Now it's time to see how to embed the generated recognizers in a program. We'll start with a simple C++ program file. Open a new text file with your favorite text editor and copy/paste this C++ source:

CalcTest.cpp
#include "CalcLexer.hpp"
#include "CalcParser.hpp"
#include <ifstream>
#include <iostream>
#include <exception>


int main(int argc, char *argv[])
{
	CalcG::CalcLexer L(std::cin);
	CalcG::CalcParser C(L);
	try
	{
		P.match__expression();
		cout << "Well done!" << endl;
	}
	catch (std::exception & ex)
	{
		std::cout << ex.what() << std::endl;
	}
}

As you see the Lexical analyzer ( L ) and the parser ( C ) are separate objects, each has it's own input stream: Character stream from the standard input for the lexical analyzer, and token stream from the lexical analyzer for the parser. This also makes it possible to combine any pair of a lexical analyzer and a parser, and nothing would go wrong as long as all Token Type IDs required by the parser are properly handled in the lexical analyzer. The character stream fed to the lexer is the stadard input, for simplicity.

JetPAG's exceptions are derived from std::exception for making handling of errors easier as shown above.

Note: In real world you should use binary streams via either the noskipws manipulator or the ios::binary file open flag. Input streams do not know which whitespaces shouldn't be skipped and would evetually skip all spaces. This is unsafe when you define rules such as strings where spaces between of the quotation marks are literals.

Save the file, let's say as ClacTest.cpp. Important: before you procede to compilation step, make sure that you've already copied the directory src/jpfiles to the current directory. A command like this compiles this program:

$ g++ -o CalcTest CalcG.cpp CalcLexer.cpp CalcParser.cpp CalcG.cpp

Or you might use this less-typing form:

$ g++ -o CalcTest *.cpp

Prior to version 0.5.3, Compilation of programs using JetPAG library required adding another compilation unit jpfiles/util/impl/format.cpp, but since version 0.5.3 this vanished.

Now we're gonna test the program against several inputs via command-line:

$ echo "1 + 3" | ./CalcTest
Well done!
$ echo "15 - (25 - 18)" | ./CalcTest
Well done!
$ echo "7 + (6 -" | ./CalcTest
Error 1.9: Unexpected end of stream while expecting ')'

If any syntax error is encountered an instance of jetpag::recognitionError is thrown, holding a comprehensive message about what's gone wrong.

Now let's make this calculator a real calculator. We're gonna add some actions so that the calculator evaluates the parsed expressions. Open the previously saved file calc.jg and change the parser specification as follows.

calc.jg
#head_top_over
{$
#include 
$}

int r
expression:
 r@additive
 ;

int r
additive:
 r@factor
 ( '+'  t@factor $$ r += t;
 | '-'  t@factor $$ r -= t;
  )*
 ;

int r
factor:
   T@INT
   $$ std::stringstream ss(T->text);
   $$ ss >> r;
 | '(' r@additive ')'
 ;

The #head_top_over directive defines code to be embedded in the top of the header file before the internal #includes (use #head_top_below to embed code after the internal #includes). Here we used it to include the module jetpag::util::texti.The package jetpag::util is a stand-alone light-weight utlility package included in JetPAG's kit, written specially for aiding minimal tasks in parsing texts.

We've added return values to our rules so that each rule returns the expression it has just parsed. A type-name is defined either by a normal name for simple types (int, string, ...) or by {% free-text %} for more complicated type names (const char*, vector<int>, map::iterator, ...). We've defined return values for rules by preceding the name of of each rule by a type-name followed by the name of the returned variable. We could initialize the return value to 0 at the beginning of rule's execution by adding = 0 (initializers in JetPAG grammar may be integers, strings, character literals, string literals or free embedded source with {$ adn $} or following $$ up the end of the current line).

To save return values of rules during evaluations, we use symbolic names. Symbolic names are specified by preceding the rule reference (or a literal, like tokens or character ranges) by a name followed by at (@) sign. Symbols are variables that store values of symbolized elements, and return values in cases of rule references. Note that JetPAG is smart enough: you do not have store the return value of a rule in a separate variable then assign it to the return variable, JetPAG knows alone if a symbol is already defined.

To perform additions, subtractions or any other free-form actions we embed source. Embedded sources are specified by either two dollar signs $$ followed by free text and terminated by a line break, or a block enclosed within {$ and $}. When generating grammars, JetPAG copies the actions to the proper places in the source. To parse numbers from a textual representation in C++ we used std::stringstream. If performance is a real issue we could use the small and fast function from the jetpag::util::texti module:

template <typename _intT, typename _charT>
_charT* jetpag::util::texti::str_to_int10(_intT &, _charT *);

which parses integers from any c-string and saves the result in the first argument (which is passed by reference), then returns a pointer to the character following the last digit in the integer string (it is an enhancement to C's atoi). A macro defines str_to_int as this function. Another function is str_to_int16 which parses integers in hexadecimal base (a macro defines it as str_to_hint).

Launch JetPAG again as described earlier in this tutorial. You might want to see how does the new additive function look:

CalcParser.cpp
int
CalcParser::
additive()
{
	// Return value
	int r;
	// Symbols
	int t;
	// Grammar
	r = factor();
	for (; ; )
	{
		switch (ncla())
		{
		case TTID::PLUS:
		{
			advance(); // '+'
			t = factor();
			r += t;
		}
		break;
		case TTID::MINUS:
		{
			advance(); // '-'
			t = factor();
			r -= t;
		}
		break;
		default:
			goto __loop_end;
		}
		continue;
		__loop_end: break;
	}
	// Return
	return r;
}

// ...

Note that the rule now returns a value, the return variable is declared at the beginning and returned at the end. Open the source file CalcTest.cpp, and change the function main as follows:

CalcTest.cpp
int main(int argc, char *argv[])
{
	CalcG::CalcLexer L(cin);
	CalcG::CalcParser C(L);
	try
	{
		int r = P.match__expression();
		cout << "Evaluated value is " << r << endl;
	}
	catch (exception & ex)
	{
		cout << ex.what() << endl;
	}
}

Compile the file CalcTest.cpp, and run it from the command line. We're gonna test it again now:

$ echo "1 + 3" | ./CalcTest
Evaluated value is 4
$ echo "15 - (25 - 18)" | ./CalcTest
Evaluated value is 8
$ echo "1 + 2 - 3 + 4 - 5" | ./CalcTest
Evaluated value is -1

That's it! now you can start using JetPAG. Ther eare many features to talk about, you can find them in the documentation section.