The Synthesizer Generator:
Specifying an Editing Environment in EBNF


WorkThis is a preliminary description of a tool that has not yet been released.

Table of Contents

Introduction

Extended BNF (EBNF) is a well-known notation for specifying the syntax of languages. The EBNF Editor is a tool for creating language-based systems from EBNF specifications.

The EBNF Editor generates a language specification in SSL, the meta-language of the Synthesizer Generator. If the editing environment generated from the generated SSL specification is adequate, you are done. Otherwise, you may choose to refine the SSL directly. SSL provide fexibility at the expense of succinctness. In contrast, EBNF provides a succinct notation for specifying a language-based editing environment with default behavior.

An Example

Here is an EBNF specification for a typical toy programming language:

      program => <statement>+; statement => id := exp | WHILE exp DO <statement>+; OD | IF exp THEN <statement>+; ELSE <statement>+; FI exp => exp + exp | exp - exp | exp * exp | exp / exp | ( exp ) | { [a-zA-Z][a-zA-Z0-9_]* } | { [0-9]+ } id => { [a-zA-Z][a-zA-Z0-9_]* }

This specification consists of four rules that can be paraphrased as follows:

  • A program consists of one or more statement's separated by semicolons.

  • A statement has one of three alternative forms: assignment statement, WHILE-statement, or IF-statement.

  • An exp has one of seven forms: sum, difference, product, quotiant, parenthesized expression, identifier, or number.

  • An id is begins with an upper or lower case letter, and is followed by zero or more letters, digits, or underscores.

The EBNF Specification Language

The example illustrates the following general rules for EBNF specifications:

  • Identifers consisting of only lower-case letters are nonterminal symbols. (In general, nonterminal symbols can also contain dashes and underscores, e.g., hi-ho and if_then_else are valid nonterminals. Dashes and underscores are indistinguishable. Thus, hi-ho and hi_ho are the same nonterminal.)

  • Each rule consists of a left and a right hand side separated by an arrow (=>). The left hand side is a nonterminal. The right hand side is a list of one or more alternatives separated by bars (|).

  • Each alternative consists of one or more pieces. There are five kinds of pieces:

    • Nonterminals, e.g., id and exp in id := exp, the alternative for assignment statements.

    • Keywords, e.g., WHILE and IF. A keyword is designated by an identifier consisting of only upper-case letters.

    • Tokens, e.g., := and +. A token is a sequence of one or more characters. Because certain characters and character combinations are reserved as EBNF meta-symbols, an escape mechanism using double quotes (") is provided. Thus, for example, the token consisting of the two characters => is designated by "=>".

    • Lexemes, e.g., { [a-zA-Z][a-zA-Z0-9_]* }. A lexeme consists of a regular expression enclosed in curly braces. The regular expression must begin and end with at least one blank, and may constain no nested unescaped blanks.

    • Embedded rules, e.g., <statement>+;. Much of the power of EBNF comes from the embedded rules. There are three sorts of embedded rules:

      • Iterations, e.g., <statement>+;. There are four kinds of iterations:

        • One or more separated items, e.g., <statement>+; designates one or more statement separated by semicolons.

        • Zero or more separated items, e.g., <exp>*, designates zero or more exp separated by commas.

        • One or more unseparated items, e.g., <name>+ designates one or more name.

        • Zero or more unseparated items, e.g., <elsif-part>* designates zero or more elsif-part.

        The examples so far may have misled you to believe that angle brackets are part of the designation of a nonterminal, but they are not. An arbitrary list of one or more alternatives separated by bars may appear between angle brackets. Thus, for example, each of the following is a valid iteration:

        • <RED | WHITE | BLUE>+,

        • <declaration | statement>*

        • <label :>*

        • <ELSEIF <stmt>+;>*

        Any token or keyword can be the separator of an iteration. For example, <foo>+BAR specifies a list of one or more exp separated by occurrences of the keyword BAR.

      • Optionals, e.g., [declaration-part]. As with iterations, a list of one or more alternatives may appear between the square brackets. For example, [LONG | SHORT] designates an optional occurrence of either the keyword LONG or the keyword SHORT.

      • Nested alternatives, e.g., <+|-|*|/>. Exactly one of the listed alternatives must occur.

Additional details about EBNF notation are described later.

The Specified Language

The language defined by an EBNF specification consists of a sequence of text fragments separated by whitespace. By default, whitespace consists of spaces, newlines, and tabs. By default, the defined language is case-insensitive. Thus, regardless of the case of characters in the EBNF specification, either case is acceptable in the language.

The Generated Environment

The Synthesizer Generator has distinct meta-linguistic mechanisms for specifying different aspects of a language-based system:

  • abstract syntax, i.e., the structure used to represent hierarchical objects,

  • concrete syntax, i.e., scanning and parsing rules that convert strings of text into abstract syntax trees,

  • unparsing rules, i.e., display rules that convert abstract syntax trees into formatted text,

  • templates, i.e., rules for inserting predefined subtrees,

  • placeholders, i.e., labelled slots that characterize the form of phrase required in the given field.

  • transforms, i.e., rules for performing predefined source-to-source translations,

  • attribute equations, i.e., rules defining derived information about abstract syntax trees.
These distinct mechanisms provide fexibility at the expense of succinctness. In contrast, EBNF provides a single succinct notation for specifying a language. As a result, in each category, a default choice is made. As a result, the concrete syntax may result in LALR(1) parsing conflicts where a hand taylored grammar would not, the unparsing rules may result in a less pleasing layout than hand written would produce, templates have default names, etc. There is no support at the moment for attribute equations.


Free Trial | Products | Customers | Support | News | Jobs | About Us         © 2007-2008, GrammaTech, Inc. All rights reserved.