These notational conventions are used for presenting syntax:
[pattern] | optional |
{pattern} | zero or more repetitions |
(pattern) | grouping |
pat1 | pat2 | choice |
pat<pat'> | difference---elements generated by pat |
except those generated by pat' | |
fibonacci | terminal syntax in typewriter font |
BNF-like syntax is used throughout, with productions having the form:
nonterm | -> | alt1 | alt2 | ... | altn |
There are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left-, right- or nonassociativity and a precedence level. A precedence-level variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. Thus, for example
aexp | -> | ( expi+1 qop(a,i) ) |
In both the lexical and the context-free syntax, there are some ambiguities that are to be resolved by making grammatical phrases as long as possible, proceeding from left to right (in shift-reduce parsing, resolving shift/reduce conflicts by shifting). In the lexical syntax, this is the "maximal munch" rule. In the context-free syntax, this means that conditionals, let-expressions, and lambda abstractions extend to the right as far as possible.
program | -> | {lexeme | whitespace } |
lexeme | -> | varid | conid | varsym | consym | literal | special | reservedop | reservedid |
literal | -> | integer | float | char | string |
special | -> | ( | ) | , | ; | [ | ] | `| { | } |
whitespace | -> | whitestuff {whitestuff} |
whitestuff | -> | whitechar | comment | ncomment |
whitechar | -> | newline | return | linefeed | vertab | formfeed |
| | space | tab | uniWhite | |
newline | -> | a newline (system dependent) |
return | -> | a carriage return |
linefeed | -> | a line feed |
vertab | -> | a vertical tab |
formfeed | -> | a form feed |
space | -> | a space |
tab | -> | a horizontal tab |
uniWhite | -> | any UNIcode character defined as whitespace |
comment | -> | dashes {any}newline |
dashes | -> | -- {-} |
opencom | -> | {- |
closecom | -> | -} |
ncomment | -> | opencom ANYseq {ncomment ANYseq}closecom |
ANYseq | -> | {ANY}<{ANY}( opencom | closecom ) {ANY}> |
ANY | -> | any | newline | vertab | formfeed |
any | -> | graphic | space | tab |
graphic | -> | small | large | symbol | digit | special | : | " | ' |
small | -> | ascSmall | uniSmall | _ |
ascSmall | -> | a | b | ... | z |
uniSmall | -> | any Unicode lowercase letter |
large | -> | ascLarge | uniLarge |
ascLarge | -> | A | B | ... | Z |
uniLarge | -> | any uppercase or titlecase Unicode letter |
symbol | -> | ascSymbol | uniSymbol |
ascSymbol | -> | ! | # | $ | % | & | * | + | . | / | < | = | > | ? | @ |
| | \ | ^ | | | - | ~ | |
uniSymbol | -> | any Unicode symbol or punctuation |
digit | -> | ascDigit | uniDigit |
ascDigit | -> | 0 | 1 | ... | 9 |
uniDigit | -> | any Unicode numeric |
octit | -> | 0 | 1 | ... | 7 |
hexit | -> | digit | A | ... | F | a | ... | f |
varid | -> | (small {small | large | digit | ' })<reservedid> | |
conid | -> | large {small | large | digit | ' } | |
reservedid | -> | case | class | data | default | deriving | do | else | |
| | if | import | in | infix | infixl | infixr | instance | ||
| | let | module | newtype | of | then | type | where | _ | ||
specialid | -> | as | qualified | hiding | |
varsym | -> | ( symbol {symbol | :})<reservedop> | |
consym | -> | (: {symbol | :})<reservedop> | |
reservedop | -> | .. | : | :: | = | \ | | | <- | -> | @ | ~ | => | |
specialop | -> | - | ! | |
varid | (variables) | ||
conid | (constructors) | ||
tyvar | -> | varid | (type variables) |
tycon | -> | conid | (type constructors) |
tycls | -> | conid | (type classes) |
modid | -> | conid | (modules) |
qvarid | -> | [ modid . ] varid | |
qconid | -> | [ modid . ] conid | |
qtycon | -> | [ modid . ] tycon | |
qtycls | -> | [ modid . ] tycls | |
qvarsym | -> | [ modid . ] varsym | |
qconsym | -> | [ modid . ] consym | |
decimal | -> | digit{digit} | |
octal | -> | octit{octit} | |
hexadecimal | -> | hexit{hexit} | |
integer | -> | decimal | |
| | 0o octal | 0O octal | ||
| | 0x hexadecimal | 0X hexadecimal | ||
float | -> | decimal . decimal[(e | E)[- | +]decimal] | |
char | -> | ' (graphic<' | \> | space | escape<\&>) ' | |
string | -> | " {graphic<" | \> | space | escape | gap}" | |
escape | -> | \ ( charesc | ascii | decimal | o octal | x hexadecimal ) | |
charesc | -> | a | b | f | n | r | t | v | \ | " | ' | & | |
ascii | -> | ^cntrl | NUL | SOH | STX | ETX | EOT | ENQ | ACK | |
| | BEL | BS | HT | LF | VT | FF | CR | SO | SI | DLE | ||
| | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | ||
| | EM | SUB | ESC | FS | GS | RS | US | SP | DEL | ||
cntrl | -> | ascLarge | @ | [ | \ | ] | ^ | _ | |
gap | -> | \ whitechar {whitechar}\ |
Section 2.7 gives an informal discussion of the layout rule. This section defines it more precisely.
The meaning of a Haskell program may depend on its layout. The effect of layout on its meaning can be completely described by adding braces and semicolons in places determined by the layout. The meaning of this augmented program is now layout insensitive.
The effect of layout is specified in this section by describing how to add braces and semicolons to a laid-out program. The specification takes the form of a function L that performs the translation. The input to L is:
The "indentation" of a lexeme is the column number indicating the start of that lexeme; the indentation of a line is the indentation of its leftmost lexeme. To determine the column number, assume a fixed-width font with this tab convention: tab stops are 8 characters apart, and a tab character causes the insertion of enough spaces to align the current position with the next tab stop. For the purposes of the layout rule, Unicode characters in a source program are considered to be of the same, fixed, width as an ASCII character. The first column is designated column 1, not 0. The application
L tokens [0]
delivers a layout-insensitive translation of tokens, where tokens is the result of lexically analysing a module and adding column-number indicators to it as described above. The definition of L is as follows, where we use ":" as a stream construction operator, and "" for the empty stream.
L (t:ts) (m:ms) | = | } : (L (t:ts) ms) | if parse-error(t) (Note 1) |
L (<n>:ts) (m:ms) | = | ; : (L ts (m:ms)) | if m = n |
= | } : (L (<n>:ts) ms) | if n < m | |
= | L ts (m:ms) | otherwise | |
L (}:ts) (0:ms) | = | } : (L ts ms) | (Note 2) |
L ({n}:ts) (m:ms) | = | { : (L ts (n:m:ms)) | if n > m, (Note 3) |
L ({:ts) ms | = | { : (L ts (0:ms)) | (Note 4) |
L (t:ts) ms | = | t : (L ts ms) | |
L [0] | = | ||
L (m:ms) | = | } : L ms | if m /=0 (Note 5) |
If none of the rules given above matches, then the algorithm fails. It can fail for instance when the end of the input is reached, and a non-layout context is active, since the close brace is missing. Some error conditions are not detected by the algorithm, although they could be: for example let }.
Note 1 implements the feature that layout processing can be stopped
prematurely by a parse error. For example
let x = e; y = x in e'
is valid, because it translates to
let { x = e; y = x } in e'
The close brace is inserted due to the parse error rule above.
Another place where the rule comes into play is at the top level
of a module:
module M where
f x = x
This translates to
module M where {
f x = x
}
The close brace is inserted because otherwise the end of file would
cause a parse error.
module | -> | module modid [exports] where body | |
| | body | ||
body | -> | { impdecls ; topdecls } | |
| | { impdecls } | ||
| | { topdecls } | ||
impdecls | -> | impdecl1 ; ... ; impdecln | (n>=1) |
exports | -> | ( export1 , ... , exportn [ , ] ) | (n>=0) |
export | -> | qvar | |
| | qtycon [(..) | ( qcname1 , ... , qcnamen )] | (n>=0) | |
| | qtycls [(..) | ( qvar1 , ... , qvarn )] | (n>=0) | |
| | module modid | ||
qcname | -> | qvar | qcon |
impdecl | -> | import [qualified] modid [as modid] [impspec] | |
| | (empty declaration) | ||
impspec | -> | ( import1 , ... , importn [ , ] ) | (n>=0) |
| | hiding ( import1 , ... , importn [ , ] ) | (n>=0) | |
import | -> | var | |
| | tycon [ (..) | ( cname1 , ... , cnamen )] | (n>=1) | |
| | tycls [(..) | ( var1 , ... , varn )] | (n>=0) | |
cname | -> | var | con |
topdecls | -> | topdecl1 ; ... ; topdecln | (n>=0) |
topdecl | -> | type simpletype = type | |
| | data [context =>] simpletype = constrs [deriving] | ||
| | newtype [context =>] simpletype = newconstr [deriving] | ||
| | class [scontext =>] tycls tyvar [where cdecls] | ||
| | instance [scontext =>] qtycls inst [where idecls] | ||
| | default (type1 , ... , typen) | (n>=0) | |
| | decl |
decls | -> | { decl1 ; ... ; decln } | (n>=0) |
decl | -> | gendecl | |
| | (funlhs | pat0) rhs | ||
cdecls | -> | { cdecl1 ; ... ; cdecln } | (n>=0) |
cdecl | -> | gendecl | |
| | (funlhs | var) rhs | ||
idecls | -> | { idecl1 ; ... ; idecln } | (n>=0) |
idecl | -> | (funlhs | qfunlhs | var | qvar) rhs | |
| | (empty) | ||
gendecl | -> | vars :: [context =>] type | (type signature) |
| | fixity [digit] ops | (fixity declaration) | |
| | (empty declaration) | ||
ops | -> | op1 , ... , opn | (n>=1) |
vars | -> | var1 , ..., varn | (n>=1) |
fixity | -> | infixl | infixr | infix |
type | -> | btype [-> type] | (function type) |
btype | -> | [btype] atype | (type application) |
atype | -> | gtycon | |
| | tyvar | ||
| | ( type1 , ... , typek ) | (tuple type, k>=2) | |
| | [ type ] | (list type) | |
| | ( type ) | (parenthesized constructor) | |
gtycon | -> | qtycon | |
| | () | (unit type) | |
| | [] | (list constructor) | |
| | (->) | (function constructor) | |
| | (,{,}) | (tupling constructors) | |
context | -> | class | |
| | ( class1 , ... , classn ) | (n>=0) | |
class | -> | qtycls tyvar | |
| | qtycls ( tyvar atype1 ... atypen ) | (n>=1) | |
scontext | -> | simpleclass | |
| | ( simpleclass1 , ... , simpleclassn ) | (n>=0) | |
simpleclass | -> | qtycls tyvar |
simpletype | -> | tycon tyvar1 ... tyvark | (k>=0) |
constrs | -> | constr1 | ... | constrn | (n>=1) |
constr | -> | con [!] atype1 ... [!] atypek | (arity con = k, k>=0) |
| | (btype | ! atype) conop (btype | ! atype) | (infix conop) | |
| | con { fielddecl1 , ... , fielddecln } | (n>=0) | |
newconstr | -> | con atype | |
| | con { var :: type } | ||
fielddecl | -> | vars :: (type | ! atype) | |
deriving | -> | deriving (dclass | (dclass1, ... , dclassn)) | (n>=0) |
dclass | -> | qtycls |
inst | -> | gtycon | |
| | ( gtycon tyvar1 ... tyvark ) | (k>=0, tyvars distinct) | |
| | ( tyvar1 , ... , tyvark ) | (k>=2, tyvars distinct) | |
| | [ tyvar ] | ||
| | ( tyvar1 -> tyvar2 ) | tyvar1 and tyvar2 distinct |
funlhs | -> | var apat {apat } |
| | pati+1 varop(a,i) pati+1 | |
| | lpati varop(l,i) pati+1 | |
| | pati+1 varop(r,i) rpati | |
| | ( funlhs ) apat {apat } | |
qfunlhs | -> | qvar apat {apat } |
| | pati+1 qvarop(a,i) pati+1 | |
| | lpati qvarop(l,i) pati+1 | |
| | pati+1 qvarop(r,i) rpati | |
| | ( qfunlhs ) apat {apat } | |
rhs | -> | = exp [where decls] |
| | gdrhs [where decls] | |
gdrhs | -> | gd = exp [gdrhs] |
gd | -> | | exp0 |
exp | -> | exp0 :: [context =>] type | (expression type signature) |
| | exp0 | ||
expi | -> | expi+1 [qop(n,i) expi+1] | |
| | lexpi | ||
| | rexpi | ||
lexpi | -> | (lexpi | expi+1) qop(l,i) expi+1 | |
lexp6 | -> | - exp7 | |
rexpi | -> | expi+1 qop(r,i) (rexpi | expi+1) | |
exp10 | -> | \ apat1 ... apatn -> exp | (lambda abstraction, n>=1) |
| | let decls in exp | (let expression) | |
| | if exp then exp else exp | (conditional) | |
| | case exp of { alts } | (case expression) | |
| | do { stmts } | (do expression) | |
| | fexp | ||
fexp | -> | [fexp] aexp | (function application) |
aexp | -> | qvar | (variable) |
| | gcon | (general constructor) | |
| | literal | ||
| | ( exp ) | (parenthesized expression) | |
| | ( exp1 , ... , expk ) | (tuple, k>=2) | |
| | [ exp1 , ... , expk ] | (list, k>=1) | |
| | [ exp1 [, exp2] .. [exp3] ] | (arithmetic sequence) | |
| | [ exp | qual1 , ... , qualn ] | (list comprehension, n>=0) | |
| | ( expi+1 qop(a,i) ) | (left section) | |
| | ( qop(a,i) expi+1 ) | (right section) | |
| | qcon { fbind1 , ... , fbindn } | (labeled construction, n>=0) | |
| | aexp{qcon} { fbind1 , ... , fbindn } | (labeled update, n >= 1) |
qual | -> | pat <- exp | (generator) |
| | let decls | (local declaration) | |
| | exp | (guard) | |
| | (empty qualifier) | ||
alts | -> | alt1 ; ... ; altn | (n>=0) |
alt | -> | pat -> exp [where decls] | |
| | pat gdpat [where decls] | ||
| | (empty alternative) | ||
gdpat | -> | gd -> exp [ gdpat ] | |
stmts | -> | stmt1 ; ... ; stmtn | (n>=0) |
stmt | -> | exp | |
| | pat <- exp | ||
| | let decls | ||
| | (empty statment) | ||
fbind | -> | qvar = exp | |
pat | -> | var + integer | (successor pattern) |
| | pat0 | ||
pati | -> | pati+1 [qconop(n,i) pati+1] | |
| | lpati | ||
| | rpati | ||
lpati | -> | (lpati | pati+1) qconop(l,i) pati+1 | |
lpat6 | -> | - (integer | float) | (negative literal) |
rpati | -> | pati+1 qconop(r,i) (rpati | pati+1) | |
pat10-> | apat | ||
| | gcon apat1 ... apatk | (arity gcon = k, k>=1) |
apat | -> | var [@ apat] | (as pattern) |
| | gcon | (arity gcon = 0) | |
| | qcon { fpat1 , ... , fpatk } | (labeled pattern, k>=0) | |
| | literal | ||
| | _ | (wildcard) | |
| | ( pat ) | (parenthesized pattern) | |
| | ( pat1 , ... , patk ) | (tuple pattern, k>=2) | |
| | [ pat1 , ... , patk ] | (list pattern, k>=1) | |
| | ~ apat | (irrefutable pattern) | |
fpat | -> | qvar = pat |
gcon | -> | () | |
| | [] | ||
| | (,{,}) | ||
| | qcon | ||
var | -> | varid | ( varsym ) | (variable) |
qvar | -> | qvarid | ( qvarsym ) | (qualified variable) |
con | -> | conid | ( consym ) | (constructor) |
qcon | -> | qconid | ( gconsym ) | (qualified constructor) |
varop | -> | varsym | `varid ` | (variable operator) |
qvarop | -> | qvarsym | `qvarid ` | (qualified variable operator) |
conop | -> | consym | `conid ` | (constructor operator) |
qconop | -> | gconsym | `qconid ` | (qualified constructor operator) |
op | -> | varop | conop | (operator) |
qop | -> | qvarop | qconop | (qualified operator) |
gconsym | -> | : | qconsym |