Joachim Breitner: Showcasing Applicative
My plan for this week s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Imports
In case you want to follow along, start with these imports:
Functor
, Applicative
and Monad
, and to highlight the value of the Applicative
abstraction.
I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1
Imports
In case you want to follow along, start with these imports:
import Data.Char
import Data.Maybe
import Data.List
import System.Environment
import System.IO
import System.Exit
The parser
The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student s homework from last week:
newtype Parser a = P (String -> Maybe (a, String))
runParser :: Parser t -> String -> Maybe (t, String)
runParser (P p) = p
parse :: Parser a -> String -> Maybe a
parse p input = case runParser p input of
Just (result, "") -> Just result
_ -> Nothing -- handles both no result and leftover input
noParserP :: Parser a
noParserP = P (\_ -> Nothing)
pureParserP :: a -> Parser a
pureParserP x = P (\input -> Just (x,input))
instance Functor Parser where
fmap f p = P $ \input -> do
(x, rest) <- runParser p input
return (f x, rest)
instance Applicative Parser where
pure = pureParserP
p1 <*> p2 = P $ \input -> do
(f, rest1) <- runParser p1 input
(x, rest2) <- runParser p2 rest1
return (f x, rest2)
instance Monad Parser where
return = pure
p1 >>= k = P $ \input -> do
(x, rest1) <- runParser p1 input
runParser (k x) rest1
anyCharP :: Parser Char
anyCharP = P $ \input -> case input of
(c:rest) -> Just (c, rest)
[] -> Nothing
charP :: Char -> Parser ()
charP c = do
c' <- anyCharP
if c == c' then return ()
else noParserP
anyCharButP :: Char -> Parser Char
anyCharButP c = do
c' <- anyCharP
if c /= c' then return c'
else noParserP
letterOrDigitP :: Parser Char
letterOrDigitP = do
c <- anyCharP
if isAlphaNum c then return c else noParserP
orElseP :: Parser a -> Parser a -> Parser a
orElseP p1 p2 = P $ \input -> case runParser p1 input of
Just r -> Just r
Nothing -> runParser p2 input
manyP :: Parser a -> Parser [a]
manyP p = (pure (:) <*> p <*> manyP p) orElseP pure []
many1P :: Parser a -> Parser [a]
many1P p = pure (:) <*> p <*> manyP p
sepByP :: Parser a -> Parser () -> Parser [a]
sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) orElseP pure []
A parser using this library for, for example, CSV files could take this form:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
where
parseLine = parseCell sepByP charP ',' <* charP '\n'
parseCell = do
charP '"'
content <- manyP (anyCharButP '"')
charP '"'
return content
We want EBNF
Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:
data RHS
= Terminal String
NonTerminal String
Choice RHS RHS
Sequence RHS RHS
Optional RHS
Repetition RHS
deriving (Show, Eq)
ppRHS :: RHS -> String
ppRHS = go 0
where
go _ (Terminal s) = surround "'" "'" $ concatMap quote s
go _ (NonTerminal s) = s
go a (Choice x1 x2) = p a 1 $ go 1 x1 ++ " " ++ go 1 x2
go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", " ++ go 2 x2
go _ (Optional x) = surround "[" "]" $ go 0 x
go _ (Repetition x) = surround " " " " $ go 0 x
surround c1 c2 x = c1 ++ x ++ c2
p a n a > n = surround "(" ")"
otherwise = id
quote '\'' = "\\'"
quote '\\' = "\\\\"
quote c = [c]
type Production = (String, RHS)
type BNF = [Production]
ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")
Code to produce EBNF
We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS
type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:
newtype Grammar a = G RHS
ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs
So a value of type Grammar t
is a description of the textual representation of the Haskell type t
.
Here is one simple example:
anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")
Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:
charG :: Char -> Grammar ()
charG c = G (Terminal [c])
A combinator that creates new grammar from two existing grammars:
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)
We want the convenience of our well-known type classes in order to combine these values some more:
instance Functor Grammar where
fmap _ (G rhs) = G rhs
instance Applicative Grammar where
pure x = G (Terminal "")
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Note how the Functor
instance does not actually use the function. How should it? There are no values inside a Grammar
!
We cannot define a Monad
instance for Grammar
: We would start with (G rhs1) >>= k =
, but there is simply no way of getting a value of type a
that we can feed to k
. So we will do without a Monad
instance. This is interesting, and we will come back to that later.
Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:
manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p) orElseG pure []
many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p
sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) orElseG pure []
Let us run a small example:
dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('',
Oh my, that is not good. Looks like the recursion in manyG
does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many
a primitive:
manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)
With this definition, we already get a simple grammar for dottedWordsG
:
*Main> putStrLn $ ppGrammar dottedWordsG
'', char , '.', char , '.'
This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string (''
) in a sequence ( ,
). We do not want that.
Why is it there in the first place? Because our Applicative
instance is not lawful! Remember that pure id <*> g == g
should hold. One way to achieve that is to improve the Applicative
instance to optimize this case away:
instance Applicative Grammar where
pure x = G (Terminal "")
G (Terminal "") <*> G rhs2 = G rhs2
G rhs1 <*> G (Terminal "") = G rhs1
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
char , '.', char , '.'
Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative
combinators, i.e. avoiding (>>=)
, (>>)
, return
and do
-notation:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
where
parseLine = parseCell sepByP charG ',' <* charP '\n'
parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'
And now we try to rewrite the code to produce Grammar
instead of Parser
. This is straightforward with the exception of anyCharButP
. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a primitive grammar, i.e. introducing a non-terminal in the EBNF without a production rule pretty much like we did for anyCharG
:
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = parseCell sepByG charG ',' <* charG '\n'
parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'
Of course the names parse
are not quite right any more, but let us just leave that for now.
Here is the result:
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), '
'
The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n'
with newlineG
:
newlineG :: Grammar ()
newlineG = primitiveG "newline"
Now we get
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline
which is nice and correct, but still not quite the easily readable EBNF that we saw further up.
Code to produce EBNF, with productions
We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:
newtype Grammar a = G (BNF, RHS)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g
Now we have to adjust all our primitive combinators (but not the derived ones!):
charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])
anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")
manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)
mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
= G (mergeProds prods1 prods2, Choice rhs1 rhs2)
instance Functor Grammar where
fmap _ (G bnf) = G bnf
instance Applicative Grammar where
pure x = G ([], Terminal "")
G (prods1, Terminal "") <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, rhs2)
G (prods1, rhs1) <*> G (prods2, Terminal "")
= G (mergeProds prods1 prods2, rhs1)
G (prods1, rhs1) <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
The use of nub
when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.
Did we gain anything? Not yet:
*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = ('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline ;
But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:
nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
= G (prods ++ [(name, rhs)], NonTerminal name)
Ample use of this in parseCSVG
yields the desired result:
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = nonTerminal "line" $
parseCell sepByG charG ',' <* newline
parseCell = nonTerminal "cell" $
charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
This is great!
Unifying parsing and grammar-generating
Note how simliar parseCSVG
and parseCSVP
are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!
And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser
and Grammer
differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char
and orElse
. It includes many
, too: Although manyP
is not primitive, manyG
is. It also includes nonTerminal
, which does not exist in the world of parsers (yet), but we need it for the grammars.
The primitiveG
function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String
, when used as a descriptive non-terminal in a grammar, and a Parser a
, used in the parsing code.
Finally, the type classes that we except, Applicative
(and thus Functor
), are added as constraints on our type class:
class Applicative f => Descr f where
char :: Char -> f ()
many :: f a -> f [a]
orElse :: f a -> f a -> f a
primitive :: String -> Parser a -> f a
nonTerminal :: String -> f a -> f a
The instances are easily written:
instance Descr Parser where
char = charP
many = manyP
orElse = orElseP
primitive _ p = p
nonTerminal _ p = p
instance Descr Grammar where
char = charG
many = manyG
orElse = orElseG
primitive s _ = primitiveG s
nonTerminal s g = nonTerminal s g
And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:
many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p
anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP
dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')
sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) orElse pure []
newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')
And thus we now have our CSV parser/grammar generator:
parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
where
parseLine = nonTerminal "line" $
parseCell sepBy char ',' <* newline
parseCell = nonTerminal "cell" $
char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'
We can now use this definition both to parse and to generate grammars:
*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]
The INI file parser and grammar
As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week s homework) again using applicative style2:
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
where
parseSection =
(,) <$ charP '['
<*> parseIdent
<* charP ']'
<* charP '\n'
<*> (catMaybes <$> manyP parseLine)
parseIdent = many1P letterOrDigitP
parseLine = parseDecl orElseP parseComment orElseP parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* manyP (charP ' ')
<* charP '='
<* manyP (charP ' ')
<*> many1P (anyCharButP '\n')
<* charP '\n')
parseComment =
Nothing <$ charP '#'
<* many1P (anyCharButP '\n')
<* charP '\n'
parseEmpty = Nothing <$ charP '\n'
Transforming that to a generic description is quite straightforward. We use primitive
again to wrap letterOrDigitP
:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection =
(,) <* char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = many1 (primitive "alphanum" letterOrDigitP)
parseLine = parseDecl orElse parseComment orElse parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* many (char ' ')
<* char '='
<* many (char ' ')
<*> many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline)
parseComment =
Nothing <$ char '#'
<* many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline
parseEmpty = Nothing <$ newline
This yields this not very helpful grammar (abbreviated here):
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, alphanum , ']', newline, alphanum, alphanum , ' '
But with a few uses of nonTerminal
, we get something really nice:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection = nonTerminal "section" $
(,) <$ char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = nonTerminal "identifier" $
many1 (primitive "alphanum" letterOrDigitP)
parseLine = nonTerminal "line" $
parseDecl orElse parseComment orElse parseEmpty
parseDecl = nonTerminal "declaration" $ Just <$> (
(,) <$> parseIdent
<* spaces
<* char '='
<* spaces
<*> remainder)
parseComment = nonTerminal "comment" $
Nothing <$ char '#' <* remainder
remainder = nonTerminal "line-remainder" $
many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
parseEmpty = Nothing <$ newline
spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, alphanum ;
spaces = ' ' ;
line-remainder = non-newline, non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration comment newline;
section = '[', identifier, ']', newline, line ;
ini = section, section ;
Recursion (variant 1)
What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr
is recursive (via plus
, mult
, atom
). We have seen above that simply defining a Grammar a
recursively does not go well.
One solution is to add a new combinator for explicit recursion, which replaces nonTerminal
in the method:
class Applicative f => Descr f where
recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
let G (prods, rhs) = f (G ([], NonTerminal name))
in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer
avoids adding a pointless expr = expr
production to the output.
This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr Mult Expr Expr Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst orElse eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
Recursion (variant 2)
Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example). To implement that we have to adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar
(including the Functor
and Applicative
instances, wich now again have nonTerminal
) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal
somewhere around the recursive calls. We still cannot implement many
as naively as above.
Homework
If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
import Data.Char
import Data.Maybe
import Data.List
import System.Environment
import System.IO
import System.Exit
newtype Parser a = P (String -> Maybe (a, String))
runParser :: Parser t -> String -> Maybe (t, String)
runParser (P p) = p
parse :: Parser a -> String -> Maybe a
parse p input = case runParser p input of
Just (result, "") -> Just result
_ -> Nothing -- handles both no result and leftover input
noParserP :: Parser a
noParserP = P (\_ -> Nothing)
pureParserP :: a -> Parser a
pureParserP x = P (\input -> Just (x,input))
instance Functor Parser where
fmap f p = P $ \input -> do
(x, rest) <- runParser p input
return (f x, rest)
instance Applicative Parser where
pure = pureParserP
p1 <*> p2 = P $ \input -> do
(f, rest1) <- runParser p1 input
(x, rest2) <- runParser p2 rest1
return (f x, rest2)
instance Monad Parser where
return = pure
p1 >>= k = P $ \input -> do
(x, rest1) <- runParser p1 input
runParser (k x) rest1
anyCharP :: Parser Char
anyCharP = P $ \input -> case input of
(c:rest) -> Just (c, rest)
[] -> Nothing
charP :: Char -> Parser ()
charP c = do
c' <- anyCharP
if c == c' then return ()
else noParserP
anyCharButP :: Char -> Parser Char
anyCharButP c = do
c' <- anyCharP
if c /= c' then return c'
else noParserP
letterOrDigitP :: Parser Char
letterOrDigitP = do
c <- anyCharP
if isAlphaNum c then return c else noParserP
orElseP :: Parser a -> Parser a -> Parser a
orElseP p1 p2 = P $ \input -> case runParser p1 input of
Just r -> Just r
Nothing -> runParser p2 input
manyP :: Parser a -> Parser [a]
manyP p = (pure (:) <*> p <*> manyP p) orElseP pure []
many1P :: Parser a -> Parser [a]
many1P p = pure (:) <*> p <*> manyP p
sepByP :: Parser a -> Parser () -> Parser [a]
sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) orElseP pure []
A parser using this library for, for example, CSV files could take this form:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
where
parseLine = parseCell sepByP charP ',' <* charP '\n'
parseCell = do
charP '"'
content <- manyP (anyCharButP '"')
charP '"'
return content
We want EBNF
Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:
data RHS
= Terminal String
NonTerminal String
Choice RHS RHS
Sequence RHS RHS
Optional RHS
Repetition RHS
deriving (Show, Eq)
ppRHS :: RHS -> String
ppRHS = go 0
where
go _ (Terminal s) = surround "'" "'" $ concatMap quote s
go _ (NonTerminal s) = s
go a (Choice x1 x2) = p a 1 $ go 1 x1 ++ " " ++ go 1 x2
go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", " ++ go 2 x2
go _ (Optional x) = surround "[" "]" $ go 0 x
go _ (Repetition x) = surround " " " " $ go 0 x
surround c1 c2 x = c1 ++ x ++ c2
p a n a > n = surround "(" ")"
otherwise = id
quote '\'' = "\\'"
quote '\\' = "\\\\"
quote c = [c]
type Production = (String, RHS)
type BNF = [Production]
ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")
Code to produce EBNF
We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS
type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:
newtype Grammar a = G RHS
ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs
So a value of type Grammar t
is a description of the textual representation of the Haskell type t
.
Here is one simple example:
anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")
Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:
charG :: Char -> Grammar ()
charG c = G (Terminal [c])
A combinator that creates new grammar from two existing grammars:
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)
We want the convenience of our well-known type classes in order to combine these values some more:
instance Functor Grammar where
fmap _ (G rhs) = G rhs
instance Applicative Grammar where
pure x = G (Terminal "")
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Note how the Functor
instance does not actually use the function. How should it? There are no values inside a Grammar
!
We cannot define a Monad
instance for Grammar
: We would start with (G rhs1) >>= k =
, but there is simply no way of getting a value of type a
that we can feed to k
. So we will do without a Monad
instance. This is interesting, and we will come back to that later.
Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:
manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p) orElseG pure []
many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p
sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) orElseG pure []
Let us run a small example:
dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('',
Oh my, that is not good. Looks like the recursion in manyG
does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many
a primitive:
manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)
With this definition, we already get a simple grammar for dottedWordsG
:
*Main> putStrLn $ ppGrammar dottedWordsG
'', char , '.', char , '.'
This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string (''
) in a sequence ( ,
). We do not want that.
Why is it there in the first place? Because our Applicative
instance is not lawful! Remember that pure id <*> g == g
should hold. One way to achieve that is to improve the Applicative
instance to optimize this case away:
instance Applicative Grammar where
pure x = G (Terminal "")
G (Terminal "") <*> G rhs2 = G rhs2
G rhs1 <*> G (Terminal "") = G rhs1
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
char , '.', char , '.'
Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative
combinators, i.e. avoiding (>>=)
, (>>)
, return
and do
-notation:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
where
parseLine = parseCell sepByP charG ',' <* charP '\n'
parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'
And now we try to rewrite the code to produce Grammar
instead of Parser
. This is straightforward with the exception of anyCharButP
. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a primitive grammar, i.e. introducing a non-terminal in the EBNF without a production rule pretty much like we did for anyCharG
:
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = parseCell sepByG charG ',' <* charG '\n'
parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'
Of course the names parse
are not quite right any more, but let us just leave that for now.
Here is the result:
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), '
'
The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n'
with newlineG
:
newlineG :: Grammar ()
newlineG = primitiveG "newline"
Now we get
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline
which is nice and correct, but still not quite the easily readable EBNF that we saw further up.
Code to produce EBNF, with productions
We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:
newtype Grammar a = G (BNF, RHS)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g
Now we have to adjust all our primitive combinators (but not the derived ones!):
charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])
anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")
manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)
mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
= G (mergeProds prods1 prods2, Choice rhs1 rhs2)
instance Functor Grammar where
fmap _ (G bnf) = G bnf
instance Applicative Grammar where
pure x = G ([], Terminal "")
G (prods1, Terminal "") <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, rhs2)
G (prods1, rhs1) <*> G (prods2, Terminal "")
= G (mergeProds prods1 prods2, rhs1)
G (prods1, rhs1) <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
The use of nub
when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.
Did we gain anything? Not yet:
*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = ('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline ;
But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:
nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
= G (prods ++ [(name, rhs)], NonTerminal name)
Ample use of this in parseCSVG
yields the desired result:
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = nonTerminal "line" $
parseCell sepByG charG ',' <* newline
parseCell = nonTerminal "cell" $
charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
This is great!
Unifying parsing and grammar-generating
Note how simliar parseCSVG
and parseCSVP
are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!
And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser
and Grammer
differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char
and orElse
. It includes many
, too: Although manyP
is not primitive, manyG
is. It also includes nonTerminal
, which does not exist in the world of parsers (yet), but we need it for the grammars.
The primitiveG
function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String
, when used as a descriptive non-terminal in a grammar, and a Parser a
, used in the parsing code.
Finally, the type classes that we except, Applicative
(and thus Functor
), are added as constraints on our type class:
class Applicative f => Descr f where
char :: Char -> f ()
many :: f a -> f [a]
orElse :: f a -> f a -> f a
primitive :: String -> Parser a -> f a
nonTerminal :: String -> f a -> f a
The instances are easily written:
instance Descr Parser where
char = charP
many = manyP
orElse = orElseP
primitive _ p = p
nonTerminal _ p = p
instance Descr Grammar where
char = charG
many = manyG
orElse = orElseG
primitive s _ = primitiveG s
nonTerminal s g = nonTerminal s g
And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:
many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p
anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP
dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')
sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) orElse pure []
newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')
And thus we now have our CSV parser/grammar generator:
parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
where
parseLine = nonTerminal "line" $
parseCell sepBy char ',' <* newline
parseCell = nonTerminal "cell" $
char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'
We can now use this definition both to parse and to generate grammars:
*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]
The INI file parser and grammar
As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week s homework) again using applicative style2:
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
where
parseSection =
(,) <$ charP '['
<*> parseIdent
<* charP ']'
<* charP '\n'
<*> (catMaybes <$> manyP parseLine)
parseIdent = many1P letterOrDigitP
parseLine = parseDecl orElseP parseComment orElseP parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* manyP (charP ' ')
<* charP '='
<* manyP (charP ' ')
<*> many1P (anyCharButP '\n')
<* charP '\n')
parseComment =
Nothing <$ charP '#'
<* many1P (anyCharButP '\n')
<* charP '\n'
parseEmpty = Nothing <$ charP '\n'
Transforming that to a generic description is quite straightforward. We use primitive
again to wrap letterOrDigitP
:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection =
(,) <* char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = many1 (primitive "alphanum" letterOrDigitP)
parseLine = parseDecl orElse parseComment orElse parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* many (char ' ')
<* char '='
<* many (char ' ')
<*> many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline)
parseComment =
Nothing <$ char '#'
<* many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline
parseEmpty = Nothing <$ newline
This yields this not very helpful grammar (abbreviated here):
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, alphanum , ']', newline, alphanum, alphanum , ' '
But with a few uses of nonTerminal
, we get something really nice:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection = nonTerminal "section" $
(,) <$ char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = nonTerminal "identifier" $
many1 (primitive "alphanum" letterOrDigitP)
parseLine = nonTerminal "line" $
parseDecl orElse parseComment orElse parseEmpty
parseDecl = nonTerminal "declaration" $ Just <$> (
(,) <$> parseIdent
<* spaces
<* char '='
<* spaces
<*> remainder)
parseComment = nonTerminal "comment" $
Nothing <$ char '#' <* remainder
remainder = nonTerminal "line-remainder" $
many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
parseEmpty = Nothing <$ newline
spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, alphanum ;
spaces = ' ' ;
line-remainder = non-newline, non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration comment newline;
section = '[', identifier, ']', newline, line ;
ini = section, section ;
Recursion (variant 1)
What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr
is recursive (via plus
, mult
, atom
). We have seen above that simply defining a Grammar a
recursively does not go well.
One solution is to add a new combinator for explicit recursion, which replaces nonTerminal
in the method:
class Applicative f => Descr f where
recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
let G (prods, rhs) = f (G ([], NonTerminal name))
in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer
avoids adding a pointless expr = expr
production to the output.
This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr Mult Expr Expr Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst orElse eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
Recursion (variant 2)
Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example). To implement that we have to adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar
(including the Functor
and Applicative
instances, wich now again have nonTerminal
) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal
somewhere around the recursive calls. We still cannot implement many
as naively as above.
Homework
If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
data RHS
= Terminal String
NonTerminal String
Choice RHS RHS
Sequence RHS RHS
Optional RHS
Repetition RHS
deriving (Show, Eq)
ppRHS :: RHS -> String
ppRHS = go 0
where
go _ (Terminal s) = surround "'" "'" $ concatMap quote s
go _ (NonTerminal s) = s
go a (Choice x1 x2) = p a 1 $ go 1 x1 ++ " " ++ go 1 x2
go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", " ++ go 2 x2
go _ (Optional x) = surround "[" "]" $ go 0 x
go _ (Repetition x) = surround " " " " $ go 0 x
surround c1 c2 x = c1 ++ x ++ c2
p a n a > n = surround "(" ")"
otherwise = id
quote '\'' = "\\'"
quote '\\' = "\\\\"
quote c = [c]
type Production = (String, RHS)
type BNF = [Production]
ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")
RHS
type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:
newtype Grammar a = G RHS
ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs
So a value of type Grammar t
is a description of the textual representation of the Haskell type t
.
Here is one simple example:
anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")
Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:
charG :: Char -> Grammar ()
charG c = G (Terminal [c])
A combinator that creates new grammar from two existing grammars:
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)
We want the convenience of our well-known type classes in order to combine these values some more:
instance Functor Grammar where
fmap _ (G rhs) = G rhs
instance Applicative Grammar where
pure x = G (Terminal "")
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Note how the Functor
instance does not actually use the function. How should it? There are no values inside a Grammar
!
We cannot define a Monad
instance for Grammar
: We would start with (G rhs1) >>= k =
, but there is simply no way of getting a value of type a
that we can feed to k
. So we will do without a Monad
instance. This is interesting, and we will come back to that later.
Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:
manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p) orElseG pure []
many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p
sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) orElseG pure []
Let us run a small example:
dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('',
Oh my, that is not good. Looks like the recursion in manyG
does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many
a primitive:
manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)
With this definition, we already get a simple grammar for dottedWordsG
:
*Main> putStrLn $ ppGrammar dottedWordsG
'', char , '.', char , '.'
This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string (''
) in a sequence ( ,
). We do not want that.
Why is it there in the first place? Because our Applicative
instance is not lawful! Remember that pure id <*> g == g
should hold. One way to achieve that is to improve the Applicative
instance to optimize this case away:
instance Applicative Grammar where
pure x = G (Terminal "")
G (Terminal "") <*> G rhs2 = G rhs2
G rhs1 <*> G (Terminal "") = G rhs1
(G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
char , '.', char , '.'
Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative
combinators, i.e. avoiding (>>=)
, (>>)
, return
and do
-notation:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
where
parseLine = parseCell sepByP charG ',' <* charP '\n'
parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'
And now we try to rewrite the code to produce Grammar
instead of Parser
. This is straightforward with the exception of anyCharButP
. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a primitive grammar, i.e. introducing a non-terminal in the EBNF without a production rule pretty much like we did for anyCharG
:
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = parseCell sepByG charG ',' <* charG '\n'
parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'
Of course the names parse
are not quite right any more, but let us just leave that for now.
Here is the result:
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), '
'
The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n'
with newlineG
:
newlineG :: Grammar ()
newlineG = primitiveG "newline"
Now we get
*Main> putStrLn $ ppGrammar parseCSVG
('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline
which is nice and correct, but still not quite the easily readable EBNF that we saw further up.
Code to produce EBNF, with productions
We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:
newtype Grammar a = G (BNF, RHS)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g
Now we have to adjust all our primitive combinators (but not the derived ones!):
charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])
anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")
manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)
mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
= G (mergeProds prods1 prods2, Choice rhs1 rhs2)
instance Functor Grammar where
fmap _ (G bnf) = G bnf
instance Applicative Grammar where
pure x = G ([], Terminal "")
G (prods1, Terminal "") <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, rhs2)
G (prods1, rhs1) <*> G (prods2, Terminal "")
= G (mergeProds prods1 prods2, rhs1)
G (prods1, rhs1) <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
The use of nub
when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.
Did we gain anything? Not yet:
*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = ('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline ;
But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:
nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
= G (prods ++ [(name, rhs)], NonTerminal name)
Ample use of this in parseCSVG
yields the desired result:
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = nonTerminal "line" $
parseCell sepByG charG ',' <* newline
parseCell = nonTerminal "cell" $
charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
This is great!
Unifying parsing and grammar-generating
Note how simliar parseCSVG
and parseCSVP
are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!
And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser
and Grammer
differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char
and orElse
. It includes many
, too: Although manyP
is not primitive, manyG
is. It also includes nonTerminal
, which does not exist in the world of parsers (yet), but we need it for the grammars.
The primitiveG
function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String
, when used as a descriptive non-terminal in a grammar, and a Parser a
, used in the parsing code.
Finally, the type classes that we except, Applicative
(and thus Functor
), are added as constraints on our type class:
class Applicative f => Descr f where
char :: Char -> f ()
many :: f a -> f [a]
orElse :: f a -> f a -> f a
primitive :: String -> Parser a -> f a
nonTerminal :: String -> f a -> f a
The instances are easily written:
instance Descr Parser where
char = charP
many = manyP
orElse = orElseP
primitive _ p = p
nonTerminal _ p = p
instance Descr Grammar where
char = charG
many = manyG
orElse = orElseG
primitive s _ = primitiveG s
nonTerminal s g = nonTerminal s g
And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:
many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p
anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP
dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')
sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) orElse pure []
newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')
And thus we now have our CSV parser/grammar generator:
parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
where
parseLine = nonTerminal "line" $
parseCell sepBy char ',' <* newline
parseCell = nonTerminal "cell" $
char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'
We can now use this definition both to parse and to generate grammars:
*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]
The INI file parser and grammar
As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week s homework) again using applicative style2:
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
where
parseSection =
(,) <$ charP '['
<*> parseIdent
<* charP ']'
<* charP '\n'
<*> (catMaybes <$> manyP parseLine)
parseIdent = many1P letterOrDigitP
parseLine = parseDecl orElseP parseComment orElseP parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* manyP (charP ' ')
<* charP '='
<* manyP (charP ' ')
<*> many1P (anyCharButP '\n')
<* charP '\n')
parseComment =
Nothing <$ charP '#'
<* many1P (anyCharButP '\n')
<* charP '\n'
parseEmpty = Nothing <$ charP '\n'
Transforming that to a generic description is quite straightforward. We use primitive
again to wrap letterOrDigitP
:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection =
(,) <* char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = many1 (primitive "alphanum" letterOrDigitP)
parseLine = parseDecl orElse parseComment orElse parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* many (char ' ')
<* char '='
<* many (char ' ')
<*> many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline)
parseComment =
Nothing <$ char '#'
<* many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline
parseEmpty = Nothing <$ newline
This yields this not very helpful grammar (abbreviated here):
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, alphanum , ']', newline, alphanum, alphanum , ' '
But with a few uses of nonTerminal
, we get something really nice:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection = nonTerminal "section" $
(,) <$ char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = nonTerminal "identifier" $
many1 (primitive "alphanum" letterOrDigitP)
parseLine = nonTerminal "line" $
parseDecl orElse parseComment orElse parseEmpty
parseDecl = nonTerminal "declaration" $ Just <$> (
(,) <$> parseIdent
<* spaces
<* char '='
<* spaces
<*> remainder)
parseComment = nonTerminal "comment" $
Nothing <$ char '#' <* remainder
remainder = nonTerminal "line-remainder" $
many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
parseEmpty = Nothing <$ newline
spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, alphanum ;
spaces = ' ' ;
line-remainder = non-newline, non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration comment newline;
section = '[', identifier, ']', newline, line ;
ini = section, section ;
Recursion (variant 1)
What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr
is recursive (via plus
, mult
, atom
). We have seen above that simply defining a Grammar a
recursively does not go well.
One solution is to add a new combinator for explicit recursion, which replaces nonTerminal
in the method:
class Applicative f => Descr f where
recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
let G (prods, rhs) = f (G ([], NonTerminal name))
in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer
avoids adding a pointless expr = expr
production to the output.
This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr Mult Expr Expr Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst orElse eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
Recursion (variant 2)
Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example). To implement that we have to adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar
(including the Functor
and Applicative
instances, wich now again have nonTerminal
) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal
somewhere around the recursive calls. We still cannot implement many
as naively as above.
Homework
If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
newtype Grammar a = G (BNF, RHS)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g
charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])
anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")
manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)
mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
= G (mergeProds prods1 prods2, Choice rhs1 rhs2)
instance Functor Grammar where
fmap _ (G bnf) = G bnf
instance Applicative Grammar where
pure x = G ([], Terminal "")
G (prods1, Terminal "") <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, rhs2)
G (prods1, rhs1) <*> G (prods2, Terminal "")
= G (mergeProds prods1 prods2, rhs1)
G (prods1, rhs1) <*> G (prods2, rhs2)
= G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = ('"', not-quote , '"', ',', '"', not-quote , '"' ''), newline ;
nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
= G (prods ++ [(name, rhs)], NonTerminal name)
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
where
parseLine = nonTerminal "line" $
parseCell sepByG charG ',' <* newline
parseCell = nonTerminal "cell" $
charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
parseCSVG
and parseCSVP
are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!
And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser
and Grammer
differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char
and orElse
. It includes many
, too: Although manyP
is not primitive, manyG
is. It also includes nonTerminal
, which does not exist in the world of parsers (yet), but we need it for the grammars.
The primitiveG
function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String
, when used as a descriptive non-terminal in a grammar, and a Parser a
, used in the parsing code.
Finally, the type classes that we except, Applicative
(and thus Functor
), are added as constraints on our type class:
class Applicative f => Descr f where
char :: Char -> f ()
many :: f a -> f [a]
orElse :: f a -> f a -> f a
primitive :: String -> Parser a -> f a
nonTerminal :: String -> f a -> f a
The instances are easily written:
instance Descr Parser where
char = charP
many = manyP
orElse = orElseP
primitive _ p = p
nonTerminal _ p = p
instance Descr Grammar where
char = charG
many = manyG
orElse = orElseG
primitive s _ = primitiveG s
nonTerminal s g = nonTerminal s g
And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:
many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p
anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP
dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')
sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) orElse pure []
newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')
And thus we now have our CSV parser/grammar generator:
parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
where
parseLine = nonTerminal "line" $
parseCell sepBy char ',' <* newline
parseCell = nonTerminal "cell" $
char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'
We can now use this definition both to parse and to generate grammars:
*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', not-quote , '"';
line = (cell, ',', cell ''), newline;
csv = line ;
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]
The INI file parser and grammar
As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week s homework) again using applicative style2:
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
where
parseSection =
(,) <$ charP '['
<*> parseIdent
<* charP ']'
<* charP '\n'
<*> (catMaybes <$> manyP parseLine)
parseIdent = many1P letterOrDigitP
parseLine = parseDecl orElseP parseComment orElseP parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* manyP (charP ' ')
<* charP '='
<* manyP (charP ' ')
<*> many1P (anyCharButP '\n')
<* charP '\n')
parseComment =
Nothing <$ charP '#'
<* many1P (anyCharButP '\n')
<* charP '\n'
parseEmpty = Nothing <$ charP '\n'
Transforming that to a generic description is quite straightforward. We use primitive
again to wrap letterOrDigitP
:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection =
(,) <* char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = many1 (primitive "alphanum" letterOrDigitP)
parseLine = parseDecl orElse parseComment orElse parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* many (char ' ')
<* char '='
<* many (char ' ')
<*> many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline)
parseComment =
Nothing <$ char '#'
<* many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline
parseEmpty = Nothing <$ newline
This yields this not very helpful grammar (abbreviated here):
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, alphanum , ']', newline, alphanum, alphanum , ' '
But with a few uses of nonTerminal
, we get something really nice:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection = nonTerminal "section" $
(,) <$ char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = nonTerminal "identifier" $
many1 (primitive "alphanum" letterOrDigitP)
parseLine = nonTerminal "line" $
parseDecl orElse parseComment orElse parseEmpty
parseDecl = nonTerminal "declaration" $ Just <$> (
(,) <$> parseIdent
<* spaces
<* char '='
<* spaces
<*> remainder)
parseComment = nonTerminal "comment" $
Nothing <$ char '#' <* remainder
remainder = nonTerminal "line-remainder" $
many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
parseEmpty = Nothing <$ newline
spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, alphanum ;
spaces = ' ' ;
line-remainder = non-newline, non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration comment newline;
section = '[', identifier, ']', newline, line ;
ini = section, section ;
Recursion (variant 1)
What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr
is recursive (via plus
, mult
, atom
). We have seen above that simply defining a Grammar a
recursively does not go well.
One solution is to add a new combinator for explicit recursion, which replaces nonTerminal
in the method:
class Applicative f => Descr f where
recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
let G (prods, rhs) = f (G ([], NonTerminal name))
in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer
avoids adding a pointless expr = expr
production to the output.
This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr Mult Expr Expr Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst orElse eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
Recursion (variant 2)
Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example). To implement that we have to adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar
(including the Functor
and Applicative
instances, wich now again have nonTerminal
) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal
somewhere around the recursive calls. We still cannot implement many
as naively as above.
Homework
If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
where
parseSection =
(,) <$ charP '['
<*> parseIdent
<* charP ']'
<* charP '\n'
<*> (catMaybes <$> manyP parseLine)
parseIdent = many1P letterOrDigitP
parseLine = parseDecl orElseP parseComment orElseP parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* manyP (charP ' ')
<* charP '='
<* manyP (charP ' ')
<*> many1P (anyCharButP '\n')
<* charP '\n')
parseComment =
Nothing <$ charP '#'
<* many1P (anyCharButP '\n')
<* charP '\n'
parseEmpty = Nothing <$ charP '\n'
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection =
(,) <* char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = many1 (primitive "alphanum" letterOrDigitP)
parseLine = parseDecl orElse parseComment orElse parseEmpty
parseDecl = Just <$> (
(,) <*> parseIdent
<* many (char ' ')
<* char '='
<* many (char ' ')
<*> many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline)
parseComment =
Nothing <$ char '#'
<* many1 (primitive "non-newline" (anyCharButP '\n'))
<* newline
parseEmpty = Nothing <$ newline
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, alphanum , ']', newline, alphanum, alphanum , ' '
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
where
parseSection = nonTerminal "section" $
(,) <$ char '['
<*> parseIdent
<* char ']'
<* newline
<*> (catMaybes <$> many parseLine)
parseIdent = nonTerminal "identifier" $
many1 (primitive "alphanum" letterOrDigitP)
parseLine = nonTerminal "line" $
parseDecl orElse parseComment orElse parseEmpty
parseDecl = nonTerminal "declaration" $ Just <$> (
(,) <$> parseIdent
<* spaces
<* char '='
<* spaces
<*> remainder)
parseComment = nonTerminal "comment" $
Nothing <$ char '#' <* remainder
remainder = nonTerminal "line-remainder" $
many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
parseEmpty = Nothing <$ newline
spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, alphanum ;
spaces = ' ' ;
line-remainder = non-newline, non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration comment newline;
section = '[', identifier, ']', newline, line ;
ini = section, section ;
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr
is recursive (via plus
, mult
, atom
). We have seen above that simply defining a Grammar a
recursively does not go well.
One solution is to add a new combinator for explicit recursion, which replaces nonTerminal
in the method:
class Applicative f => Descr f where
recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
let G (prods, rhs) = f (G ([], NonTerminal name))
in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer
avoids adding a pointless expr = expr
production to the output.
This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr Mult Expr Expr Const Integer
deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
mkPlus <$> eMult exp
<*> many (spaces *> char '+' *> spaces *> eMult exp)
<* spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
mkPlus <$> eAtom exp
<*> many (spaces *> char '*' *> spaces *> eAtom exp)
<* spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
aConst orElse eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
id <$ char '('
<* spaces
<*> inner
<* spaces
<* char ')'
<* spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, digit ;
spaces = ' ' newline ;
atom = const '(', spaces, expr, spaces, ')', spaces;
mult = atom, spaces, '*', spaces, atom , spaces;
plus = mult, spaces, '+', spaces, mult , spaces;
expr = plus;
Recursion (variant 2)
Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal
and explicitly passing around the recursive call (i.e. the exp
in the example). To implement that we have to adjust our Grammar
type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal
, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar
(including the Functor
and Applicative
instances, wich now again have nonTerminal
) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal
somewhere around the recursive calls. We still cannot implement many
as naively as above.
Homework
If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
newtype Grammar a = G ([String] -> (BNF, RHS))
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
if name elem seen
then ([], NonTerminal name)
else let (prods, rhs) = g (name : seen)
in (prods ++ [(name, rhs)], NonTerminal name)
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
mkPlus <$> eMult
<*> many (spaces *> char '+' *> spaces *> eMult)
<* spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
mkPlus <$> eAtom
<*> many (spaces *> char '*' *> spaces *> eAtom)
<* spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
aConst orElse eParens parseExp
identifier = letter, letter digit '-' ;
spaces = ' ' newline ;
quoted-char = non-quote-or-backslash '\\', '\\' '\\', '\'';
terminal = '\'', quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal non-terminal option repetition group;
sequence = atom, spaces, ',', spaces, atom , spaces;
choice = sequence, spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, production ;
This grammar is set up so that the precedence of ,
and
is correctly implemented: a , b c
will parse as (a, b) c
.
In this syntax for BNF, terminal characters are quoted, i.e. inside ' '
, a '
is replaced by \'
and a \
is replaced by \\
this is done by the function quote
in ppRHS
.
If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF
as a parser on the pretty-printed grammar produced from interpreting parseBNF
as a grammar.
Conclusion
We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.
Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative
, and not Monad
. Applicative
is less powerful, so by giving less power to the user of our Descr
interface, the other side, i.e. the implementation, can be more powerful.
The reason why Applicative
is ok, but Monad
is not, is that in Applicative
, the results do not affect the shape of the computation, whereas in Monad
, the whole point of the bind operator (>>=)
is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!
We have also seen that a phantom type, namely the parameter of Grammar
, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG
ensures that both grammars that are combined here indeed describe something of the same type.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday.
- I like how in this alignment of
<*>
and <*
the >
point out where the arguments are that are being passed to the function on the left.
- It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using
Applicative
yesterday. - I like how in this alignment of
<*>
and<*
the>
point out where the arguments are that are being passed to the function on the left.