Rules

The backbone of a grammar specification is the set of rules. Rules define the behavior of the recognizer concerning certain input. Each specification defines rules, which together form the semantics of the generated recognizer.

In general, a grammar rule's definition is close to:

modifiers?
return-value?
rule-name
special-attributes?
argument-list?
symbol-defs?
header-actions?
 :
 sub-grammar
 ;
handlers?

Modifiers set the type of the rule. In JetPAG there are five possible midifiers: hidden, inline, skip and abstract. abstract rules do not define sub-grammars. skip and abstract are only allowed in scanner grammars. Each type of rule has its restriction regarding allowed customizations.

Return values allow rules to return value after execution. Return values are declared at the header of the generated rule, and returned at the end (never use return statements!). A return value consists of a typename, an identifier and optionally an initialization expression. Note that return values in generated code are initialized with constructor calls not assignments, i.e. the embedded source for the initialization expression sis copied after being enclosed with ( and ), allowing passing multiple parameters to constructors. This example defines a rule that returns an integer, which is zero by default:

int r = 0
myrule:
 A
 ( B $$ r = 1;
  )?
 ;

The generated code would look like:

int myrule()
{
	int r(0);
	match(A);
	if (B == ncla())
	{
		match(); // B
		r = 1;
	}
	return r;
}

Special attributes are constructs are are specific to each type of rules.

Argument lists allow the generated methods for the rule to have customized parameter lists. In this example the hidden rule accepts two integers, parses a third integer of one digit and returns the sum:

hidden
int r
myrule ( int a, char b = 0 )
 :
 T@DGT
 $$ r = a + b + T->text[0] - '0';
 ;

Symbol definitions allow overriding the default behavior of generating symbolic name variable. A symbol definition consisting of a name only prevents generating a variable for that name. A symbol definition consisting of a name followed by an embedded source prevents generating a variable and specifies a replacement for the symbol in the assignment. Latter form would mostly be used to assign non-local variables a value, like in this example:

myrule { var {$ SomeNameSpace::PublicVar $}, var2 {$ SomeObj->SomeMmb $} }
 :
 var @ TOK
 var2 @ TOK2
 ;

The generated code would look like:

SomeNameSpace::PublicVar = match(TOK);
SomeObj->SomeMmb = match(TOK2);

Header actions are actions to be done before anything in the rule. These are embedded sources that are simply copied to the beginning of the generated rule's body, before variable and return value declarations.

Handlers give a way to overrdie default error handling and, if necessary, prevent memory leaks. For example this rule defines handlers to print customized error messages:

INT:
 '0'-'9'+
 ;
handles {% jetpag::recognitionError &err %}
{$
std::cout << "Syntax ERROR!" << std::endl;
throw;
$}

JetPAG grammars strictly follow scope uniqueness for rules. Each group of rules belongs to certain scope, where no two rules may have the same name. The most basic scope is namespace scope: the grammar, the scanner and the parser all must have unique name each from others. Rule scopes are divided into two groups: the public scope contains normal rules and abstarct scanner rules, and hidden scope contains hiddens rules, inline rules and skipped scanner rules. The hidden scope is strictly unique only in the enclosing specification (grammar/scanner).

If a small piece of grammar is used several times in the file and using a hidden rule for reusing it in a rule reference hurts performance or results in ambiguous grammars, inline rules are the solution. Inline rules are special rules that define grammars to be cloned whenever the rule is referenced. Inline rules define nothing but sub-grammars (none of the customizations above applies to inline rules). However inline rules might define template parameters, which are custom grammar replacements which can be specified on eah inlining. This compound example uses inline rules for string and character literals:

Char:	"'" CannChar{{"'"}} "'"
	;

String:	'"' CannChar{{'"'}}+ '"'
	;

inline CannChar {{ QUOTE }}:
 	'\\' ('a'|'b'|'f'|'n'|'r'|'t'|'v'|"'"|'"'|'\\')
|	~('\\'|QUOTE)
	;