Search Results: "prh"

1 November 2016

Thorsten Alteholz: My Debian Activities in October 2016

FTP assistant This month I caught up from last month and marked 317 packages for accept and rejected 23 packages. I also sent 5 emails to maintainers asking questions. Debian LTS This was my twenty-eighth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. This month my all in all workload has been 13h. During that time I did uploads of The second upload of bind was an embargoed one. Other stuff I uploaded a new version of greylistd and fixed RC bug #837501. A new version of highlight.js fixed RC bug #830189. With a new upstream version of chktex I could close bugs #782342, #782343 and #819885. I also uploaded the new package node-random-bytes and new upstream versions of alljoyn-core-1604 and duktape Finally, after about 4 years, I managed to upload entropybroker and instantly had to deal with #840018, #840019 and #840020. One cannot overemphasize the importance of our QA stuff! I also uploaded a new version of libctl to solve the -fPIC issue but was asked short time after to revert that again :-(. As already mentioned some days ago I adopted libmatthew-java. At that time about 956 package were orphaned and I asked everybody to adopt one of these packages. Unfortunately now there are 982 package orphaned. I guess I have to clear up a misunderstanding. You should adopt those packages and not oprhan more of them!

26 October 2016

Joachim Breitner: Showcasing Applicative

My plan for this week s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction. I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1

Imports In case you want to follow along, start with these imports:
import Data.Char
import Data.Maybe
import Data.List
import System.Environment
import System.IO
import System.Exit

The parser The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student s homework from last week:
newtype Parser a = P (String -> Maybe (a, String))
runParser :: Parser t -> String -> Maybe (t, String)
runParser (P p) = p
parse :: Parser a -> String -> Maybe a
parse p input = case runParser p input of
    Just (result, "") -> Just result
    _ -> Nothing -- handles both no result and leftover input
noParserP :: Parser a
noParserP = P (\_ -> Nothing)
pureParserP :: a -> Parser a
pureParserP x = P (\input -> Just (x,input))
instance Functor Parser where
    fmap f p = P $ \input -> do
	(x, rest) <- runParser p input
	return (f x, rest)
instance Applicative Parser where
    pure = pureParserP
    p1 <*> p2 = P $ \input -> do
        (f, rest1) <- runParser p1 input
        (x, rest2) <- runParser p2 rest1
        return (f x, rest2)
instance Monad Parser where
    return = pure
    p1 >>= k = P $ \input -> do
        (x, rest1) <- runParser p1 input
        runParser (k x) rest1
anyCharP :: Parser Char
anyCharP = P $ \input -> case input of
    (c:rest) -> Just (c, rest)
    []       -> Nothing
charP :: Char -> Parser ()
charP c = do
    c' <- anyCharP
    if c == c' then return ()
               else noParserP
anyCharButP :: Char -> Parser Char
anyCharButP c = do
    c' <- anyCharP
    if c /= c' then return c'
               else noParserP
letterOrDigitP :: Parser Char
letterOrDigitP = do
    c <- anyCharP
    if isAlphaNum c then return c else noParserP
orElseP :: Parser a -> Parser a -> Parser a
orElseP p1 p2 = P $ \input -> case runParser p1 input of
    Just r -> Just r
    Nothing -> runParser p2 input
manyP :: Parser a -> Parser [a]
manyP p = (pure (:) <*> p <*> manyP p)  orElseP  pure []
many1P :: Parser a -> Parser [a]
many1P p = pure (:) <*> p <*> manyP p
sepByP :: Parser a -> Parser () -> Parser [a]
sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1)))  orElseP  pure []
A parser using this library for, for example, CSV files could take this form:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
  where
    parseLine = parseCell  sepByP  charP ',' <* charP '\n'
    parseCell = do
        charP '"'
        content <- manyP (anyCharButP '"')
        charP '"'
        return content

We want EBNF Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:
cell = '"',  not-quote , '"';
line = (cell,  ',', cell    ''), newline;
csv  =  line ;
It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:
data RHS
  = Terminal String
    NonTerminal String
    Choice RHS RHS
    Sequence RHS RHS
    Optional RHS
    Repetition RHS
  deriving (Show, Eq)
ppRHS :: RHS -> String
ppRHS = go 0
  where
    go _ (Terminal s)     = surround "'" "'" $ concatMap quote s
    go _ (NonTerminal s)  = s
    go a (Choice x1 x2)   = p a 1 $ go 1 x1 ++ "   " ++ go 1 x2
    go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", "  ++ go 2 x2
    go _ (Optional x)     = surround "[" "]" $ go 0 x
    go _ (Repetition x)   = surround " " " " $ go 0 x
    surround c1 c2 x = c1 ++ x ++ c2
    p a n   a > n     = surround "(" ")"
            otherwise = id
    quote '\'' = "\\'"
    quote '\\' = "\\\\"
    quote c    = [c]
type Production = (String, RHS)
type BNF = [Production]
ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")

Code to produce EBNF We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:
newtype Grammar a = G RHS
ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs
So a value of type Grammar t is a description of the textual representation of the Haskell type t. Here is one simple example:
anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")
Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:
charG :: Char -> Grammar ()
charG c = G (Terminal [c])
A combinator that creates new grammar from two existing grammars:
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)
We want the convenience of our well-known type classes in order to combine these values some more:
instance Functor Grammar where
    fmap _ (G rhs) = G rhs
instance Applicative Grammar where
    pure x = G (Terminal "")
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar! We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = , but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later. Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:
manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p)  orElseG  pure []
many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p
sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1)))  orElseG  pure []
Let us run a small example:
dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('',  
Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many a primitive:
manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)
With this definition, we already get a simple grammar for dottedWordsG:
*Main> putStrLn $ ppGrammar dottedWordsG
'',  char , '.',  char , '.' 
This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string ('') in a sequence ( , ). We do not want that. Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:
instance Applicative Grammar where
    pure x = G (Terminal "")
    G (Terminal "") <*> G rhs2 = G rhs2
    G rhs1 <*> G (Terminal "") = G rhs1
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
	 
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
 char , '.',  char , '.' 
Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:
parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
  where
    parseLine = parseCell  sepByP  charG ',' <* charP '\n'
    parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'
And now we try to rewrite the code to produce Grammar instead of Parser. This is straightforward with the exception of anyCharButP. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a primitive grammar, i.e. introducing a non-terminal in the EBNF without a production rule pretty much like we did for anyCharG:
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
  where
    parseLine = parseCell  sepByG  charG ',' <* charG '\n'
    parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'
Of course the names parse are not quite right any more, but let us just leave that for now. Here is the result:
*Main> putStrLn $ ppGrammar parseCSVG
 ('"',  not-quote , '"',  ',', '"',  not-quote , '"'    ''), '
' 
The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:
newlineG :: Grammar ()
newlineG = primitiveG "newline"
Now we get
*Main> putStrLn $ ppGrammar parseCSVG
 ('"',  not-quote , '"',  ',', '"',  not-quote , '"'    ''), newline 
which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

Code to produce EBNF, with productions We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:
newtype Grammar a = G (BNF, RHS)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g
Now we have to adjust all our primitive combinators (but not the derived ones!):
charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])
anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")
manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)
mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2
orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
    = G (mergeProds prods1 prods2, Choice rhs1 rhs2)
instance Functor Grammar where
    fmap _ (G bnf) = G bnf
instance Applicative Grammar where
    pure x = G ([], Terminal "")
    G (prods1, Terminal "") <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, rhs2)
    G (prods1, rhs1) <*> G (prods2, Terminal "")
        = G (mergeProds prods1 prods2, rhs1)
    G (prods1, rhs1) <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)
The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now. Did we gain anything? Not yet:
*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv =  ('"',  not-quote , '"',  ',', '"',  not-quote , '"'    ''), newline ;
But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:
nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
  = G (prods ++ [(name, rhs)], NonTerminal name)
Ample use of this in parseCSVG yields the desired result:
parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
  where
    parseLine = nonTerminal "line" $
        parseCell  sepByG  charG ',' <* newline
    parseCell = nonTerminal "cell" $
        charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"',  not-quote , '"';
line = (cell,  ',', cell    ''), newline;
csv =  line ;
This is great!

Unifying parsing and grammar-generating Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync! And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammars. The primitiveG function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Parser a, used in the parsing code. Finally, the type classes that we except, Applicative (and thus Functor), are added as constraints on our type class:
class Applicative f => Descr f where
    char :: Char -> f ()
    many :: f a -> f [a]
    orElse :: f a -> f a -> f a
    primitive :: String -> Parser a -> f a
    nonTerminal :: String -> f a -> f a
The instances are easily written:
instance Descr Parser where
    char = charP
    many = manyP
    orElse = orElseP
    primitive _ p = p
    nonTerminal _ p = p
instance Descr Grammar where
    char = charG
    many = manyG
    orElse = orElseG
    primitive s _ = primitiveG s
    nonTerminal s g = nonTerminal s g
And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:
many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p
anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP
dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')
sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1)))  orElse  pure []
newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')
And thus we now have our CSV parser/grammar generator:
parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
  where
    parseLine = nonTerminal "line" $
        parseCell  sepBy  char ',' <* newline
    parseCell = nonTerminal "cell" $
        char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'
We can now use this definition both to parse and to generate grammars:
*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"',  not-quote , '"';
line = (cell,  ',', cell    ''), newline;
csv =  line ;
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]

The INI file parser and grammar As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week s homework) again using applicative style2:
parseINIP :: Parser INIFile
parseINIP = many1P parseSection
  where
    parseSection =
        (,) <$  charP '['
            <*> parseIdent
            <*  charP ']'
            <*  charP '\n'
            <*> (catMaybes <$> manyP parseLine)
    parseIdent = many1P letterOrDigitP
    parseLine = parseDecl  orElseP  parseComment  orElseP  parseEmpty
    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  manyP (charP ' ')
            <*  charP '='
            <*  manyP (charP ' ')
            <*> many1P (anyCharButP '\n')
            <*  charP '\n')
    parseComment =
        Nothing <$ charP '#'
                <* many1P (anyCharButP '\n')
                <* charP '\n'
    parseEmpty = Nothing <$ charP '\n'
Transforming that to a generic description is quite straightforward. We use primitive again to wrap letterOrDigitP:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
  where
    parseSection =
        (,) <*  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = many1 (primitive "alphanum" letterOrDigitP)
    parseLine = parseDecl  orElse  parseComment  orElse  parseEmpty
    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  many (char ' ')
            <*  char '='
            <*  many (char ' ')
            <*> many1 (primitive "non-newline" (anyCharButP '\n'))
	    <*  newline)
    parseComment =
        Nothing <$ char '#'
                <* many1 (primitive "non-newline" (anyCharButP '\n'))
		<* newline
    parseEmpty = Nothing <$ newline
This yields this not very helpful grammar (abbreviated here):
*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum,  alphanum , ']', newline,  alphanum,  alphanum ,  ' ' 
But with a few uses of nonTerminal, we get something really nice:
descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
  where
    parseSection = nonTerminal "section" $
        (,) <$  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = nonTerminal "identifier" $
        many1 (primitive "alphanum" letterOrDigitP)
    parseLine = nonTerminal "line" $
        parseDecl  orElse  parseComment  orElse  parseEmpty
    parseDecl = nonTerminal "declaration" $ Just <$> (
        (,) <$> parseIdent
            <*  spaces
            <*  char '='
            <*  spaces
            <*> remainder)
    parseComment = nonTerminal "comment" $
        Nothing <$ char '#' <* remainder
    remainder = nonTerminal "line-remainder" $
        many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
    parseEmpty = Nothing <$ newline
    spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum,  alphanum ;
spaces =  ' ' ;
line-remainder = non-newline,  non-newline , newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration   comment   newline;
section = '[', identifier, ']', newline,  line ;
ini = section,  section ;

Recursion (variant 1) What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:
const = digit,  digit ;
spaces =  ' '   newline ;
atom = const   '(', spaces, expr, spaces, ')', spaces;
mult = atom,  spaces, '*', spaces, atom , spaces;
plus = mult,  spaces, '+', spaces, mult , spaces;
expr = plus;
The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well. One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:
class Applicative f => Descr f where
     
    recNonTerminal :: String -> (f a -> f a) -> f a
instance Descr Parser where
     
    recNonTerminal _ p = let r = p r in r
instance Descr Grammar where
     
    recNonTerminal = recNonTerminalG
recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
    let G (prods, rhs) = f (G ([], NonTerminal name))
    in G (prods ++ [(name, rhs)], NonTerminal name)
nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)
runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt))   main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
The change in runGrammer avoids adding a pointless expr = expr production to the output. This lets us define a parser/grammar-generator for the arithmetic expressions given above:
data Expr = Plus Expr Expr   Mult Expr Expr   Const Integer
    deriving Show
mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus
mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult
parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
    ePlus exp
ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
    mkPlus <$> eMult exp
           <*> many (spaces *> char '+' *> spaces *> eMult exp)
           <*  spaces
eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
    mkPlus <$> eAtom exp
           <*> many (spaces *> char '*' *> spaces *> eAtom exp)
           <*  spaces
eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
    aConst  orElse  eParens exp
aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit
eParens :: Descr f => f a -> f a
eParens inner =
    id <$  char '('
       <*  spaces
       <*> inner
       <*  spaces
       <*  char ')'
       <*  spaces
And indeed, this works:
*Main> putStr $ ppGrammar "expr" parseExpr
const = digit,  digit ;
spaces =  ' '   newline ;
atom = const   '(', spaces, expr, spaces, ')', spaces;
mult = atom,  spaces, '*', spaces, atom , spaces;
plus = mult,  spaces, '+', spaces, mult , spaces;
expr = plus;

Recursion (variant 2) Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:
newtype Grammar a = G ([String] -> (BNF, RHS))
The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:
nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
    if name  elem  seen
    then ([], NonTerminal name)
    else let (prods, rhs) = g (name : seen)
         in (prods ++ [(name, rhs)], NonTerminal name)
After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:
parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $
    ePlus
ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
    mkPlus <$> eMult
           <*> many (spaces *> char '+' *> spaces *> eMult)
           <*  spaces
eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
    mkPlus <$> eAtom
           <*> many (spaces *> char '*' *> spaces *> eAtom)
           <*  spaces
eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
    aConst  orElse  eParens parseExp
Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.

Homework If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:
identifier = letter,  letter   digit   '-' ;
spaces =  ' '   newline ;
quoted-char = non-quote-or-backslash   '\\', '\\'   '\\', '\'';
terminal = '\'',  quoted-char , '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = ' ', spaces, rhs, spaces, ' ', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal   non-terminal   option   repetition   group;
sequence = atom,  spaces, ',', spaces, atom , spaces;
choice = sequence,  spaces, ' ', spaces, sequence , spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production,  production ;
This grammar is set up so that the precedence of , and is correctly implemented: a , b c will parse as (a, b) c. In this syntax for BNF, terminal characters are quoted, i.e. inside ' ', a ' is replaced by \' and a \ is replaced by \\ this is done by the function quote in ppRHS. If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:
*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
True
*Main> f bnf2 == bnf2
True
The last line is quite meta: We are using parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.

Conclusion We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here. Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful. The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around! We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.

  1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.
  2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.

23 October 2016

Vincent Sanders: Rabbit of Caerbannog

Subsequent to my previous use of American Fuzzy Lop (AFL) on the NetSurf bitmap image library I applied it to the gif library which, after fixing the test runner, failed to produce any crashes but did result in a better test corpus improving coverage above 90%

I then turned my attention to the SVG processing library. This was different to the bitmap libraries in that it required parsing a much lower density text format and performing operations on the resulting tree representation.

The test program for the SVG library needed some improvement but is very basic in operation. It takes the test SVG, parses it using libsvgtiny and then uses the parsed output to write out an imagemagick mvg file.

The libsvg processing uses the NetSurf DOM library which in turn uses an expat binding to parse the SVG XML text. To process this with AFL required instrumenting not only the XVG library but the DOM library. I did not initially understand this and my first run resulted in a "map coverage" indicating an issue. Helpfully the AFL docs do cover this so it was straightforward to rectify.

Once the test program was written and environment set up an AFL run was started and left to run. The next day I was somewhat alarmed to discover the fuzzer had made almost no progress and was running very slowly. I asked for help on the AFL mailing list and got a polite and helpful response, basically I needed to RTFM.

I must thank the members of the AFL mailing list for being so helpful and tolerating someone who ought to know better asking dumb questions.

After reading the fine manual I understood I needed to ensure all my test cases were as small as possible and further that the fuzzer needed a dictionary as a hint to the file format because the text file was of such low data density compared to binary formats.

Rabbit of Caerbannog. Death awaits you with pointy teeth
I crafted an SVG dictionary based on the XML one, ensured all the seed SVG files were as small as possible and tried again. The immediate result was thousands of crashes, nothing like being savaged by a rabbit to cause a surprise.

Not being in possession of the appropriate holy hand grenade I resorted instead to GDB and electric fence. Unlike the bitmap library crashes memory bounds issues simply did not feature in the crashes.Instead they mainly centered around actual logic errors when constructing and traversing the data structures.

For example Daniel Silverstone fixed an interesting bug where the XML parser binding would try and go "above" the root node in the tree if the source closed more tags than it opened which resulted in wild pointers and NULL references.

I found and squashed several others including dealing with SVG which has no valid root element and division by zero errors when things like colour gradients have no points.

I find it interesting that the type and texture of the crashes completely changed between the SVG and binary formats. Perhaps it is just the nature of the textural formats that causes this although it might be due to the techniques used to parse the formats.

Once all the immediately reproducible crashes were dealt with I performed a longer run. I used my monster system as previously described and ran the fuzzer for a whole week.

Summary stats
=============

Fuzzers alive : 10
Total run time : 68 days, 7 hours
Total execs : 9268 million
Cumulative speed : 15698 execs/sec
Pending paths : 0 faves, 2501 total
Pending per fuzzer : 0 faves, 250 total (on average)
Crashes found : 9 locally unique

After burning almost seventy days of processor time AFL found me another nine crashes and possibly more importantly a test corpus that generates over 90% coverage.

A useful tool that AFL provides is afl-cmin. This reduces the number of test files in a corpus to only those that are required to exercise all the code paths reached by the test set. In this case it reduced the number of files from 8242 to 2612

afl-cmin -i queue_all/ -o queue_cmin -- test_decode_svg @@ 1.0 /dev/null
corpus minimization tool for afl-fuzz by

[+] OK, 1447 tuples recorded.
[*] Obtaining traces for input files in 'queue_all/'...
Processing file 8242/8242...
[*] Sorting trace sets (this may take a while)...
[+] Found 23812 unique tuples across 8242 files.
[*] Finding best candidates for each tuple...
Processing file 8242/8242...
[*] Sorting candidate list (be patient)...
[*] Processing candidates and writing output files...
Processing tuple 23812/23812...
[+] Narrowed down to 2612 files, saved in 'queue_cmin'.

Additionally the actual information within the test files can be minimised with the afl-tmin tool. This must be run on each file individually and can take a relatively long time. Fortunately with GNU parallel one can run many of these jobs simultaneously which merely required another three days of CPU time to process. The resulting test corpus weighs in at a svelte 15 Megabytes or so against the 25 Megabytes before minimisation.

The result is yet another NetSurf library significantly improved by the use of AFL both from finding and squashing crashing bugs and from having a greatly improved test corpus to allow future library changes with a high confidence there will not be any regressions.

3 July 2014

Hideki Yamane: Open Source Conference 2014 Hokkaido

Oh, time flies... (= I'm lazy)

14th June, I've participated to OSC (Open Source Conference) 2014 Hokkaido in Sapporo, Hokkaido (sorry openSUSE folks, OSC does not mean openSUSE Conference ;) OSC has 10 years history in Japan, so don't blame me...)

Hokkaido is northan island of Japan (it has 4 major islands - Hokkaido, Honshu, Shikoku and Kyushu), takes 1.5 hours from Tokyo (HND-CTS) and debian-mirror.sakura.ne.jp is also there.

As always, we show the Debian booth with Debian lovers, Squeeze, Woody and Jessie.


And I gave talk about Debian a little,
mostly how it is developed and distribute, and shapes in Jessie at that time (PDF/odf is my page on Debian Wiki as usual).

Does Cowgirl Dream of Red Swirl? from Hideki Yamane


After that, Enjoyed food, beer (sure! :) and chatting in party.


Folks, see you in #osc15do again!

21 April 2014

John Goerzen: A Dry Spring

Spring in the prairie is a bit of an odd thing this year. Here and there, near ditches and creeks, a short, soft blanket of lush green grass covers the ground. A few feet away, patches of green are visible between the brown shoots of last year s grass. Some trees are already turning green, purple, red, and white, while others stand still and brown, stubbornly insisting that spring is not here yet. To look at the thermometer may not be much guide either; two days after the temperature was nearly 90, we woke to see a dusting of snow on the ground. It s been dry, terribly try in Kansas. Grass next to a gravel driveway or road often has a chalkish layer of dust on it, kicked up by passing cars or even a stiff wind. The earth thirsts. It is somehow fitting to celebrate Easter, that spring holiday, in the midst of the dry ground, to remember that water is not the only thing that can quench thirst. Easter morning began sleepily, as we got up early to head to a sunrise service. It was in a pasture just outside a small Kansas town, and we gathered there at about 6:15, wearing only light jackets against the breeze. A fire was burning, and there was water on hand to quickly douse any grass that caught first that wasn t supposed to and it was occasionally used. I was doing the prelude for the service, playing on my penny whistle. I enjoyed being able to do that, and was glad that the wind was calm enough that it didn t interfere too much with the music. We sang some hymns, listened to some Bible readings, and just stood in silence, listening to the crackle of the fire, some country dogs playing, and watching the sky to the east transform as the sun came up. Then it was on to church for breakfast, and a break before the Easter service the pipe organ ringing, piano playing with it, and deep trombone and full sanctuary of people singing our 4-part Easter hymns celebrating the day. Laura had the idea of pinning carnations onto the cross, and we got to watch everyone come up and add theirs. Jacob and Oliver enjoyed the sunrise service. They decided they would keep a watchful eye on the first and the dogs, they enjoyed muffins at breakfast and playing in the church after that. But if you are 4 or 7, what is Easter without an Easter egg hunt? And they got in several. Laura and I hid some eggs around the yard. Jacob asked me to use a radio to tell them when the eggs were ready. Here they are, bounding out the door to begin the hunt! And, of course, if you are 4 or 7 and have a geek for a dad, you will naturally think to bring radios with you to the next hunt. To tell your brother what you re finding, of course. It was a good weekend, and in fact, Jacob even volunteered to put up a wet floor sign after he spilled some water: On the last car ride of the day, Jacob decided he would write a story about his Easter. He decided he would publish a big book, and be a famous author and make other children happy. Oliver, of course, decided he needed an Easter story also. We couldn t very well publish a book in the car, but I did manage to use my phone to capture their stories. It s been a long and busy week, but there is much to be joyful about, even when tired.

11 November 2009

Yves-Alexis Perez: Key transition (this is _not_ a meme)

Ok, so following the trend, I created some time ago a new GPG key, which I'm now transitioning too. I've set up a transition document, available at http://molly.corsac.net/~corsac/key-transition.txt. It's signed by both the old and the new keys and is reproduced below:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160,SHA512
Wed, 11 Nov 2009 13:44:05 +0100
I've recently set up a new RSA-based GPG key, and will be transitioning away
from my old DSA-based one.  The old key will be revoked soon, so I prefer all
future correspondence to use the new one.  I would also like to ensure that
this new key is well-integrated into the web of trust.  This message is signed
by both keys to certify the transition.
The old DSA key was:
pub   1024D/C5C05BAE 2004-11-11
      Key fingerprint = DE26 2FC4 7097 FFC6 DE2C  D8C0 4D44 C020 C5C0 5BAE
The new RSA key is:
pub   4096R/71EF0BA8 2009-05-06
      Key fingerprint = 4510 DCB5 7ED4 7040 60C6  6476 3055 0F78 71EF 0BA8
If you already know my old key, you can verify that the new key is
signed by the old one:
  gpg --check-sigs 71EF0BA8
If you don't already know my old key, or if you're extra-paranoid, you
can check the fingerprint against the one given above:
  gpg --fingerprint 71EF0BA8
If you have previously signed my old DSA key, and if you're satisfied
that you've got the correct new RSA key, then I'd appreciate it if you
would sign my new key as well:
  caff 71EF0BA8
The caff program is in the signing-party package in Debian.  Please be careful
to generate signatures that don't rely on the weakening SHA-1 hash algorithm,
which requires some careful configuration even if you've already configured
gpg correctly.  See http://www.gag.com/bdale/blog/posts/Strong_Keys.html for
the gory details.
Thanks,
- --
Yves-Alexis Perez
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEAREDAAYFAkr6sqQACgkQTUTAIMXAW64HiACeIyabQueDHAeiAX8EkIeApiDj
++UAn2z7YkjHx0lQh0+s5WdhikG0YztiiQIcBAEBCgAGBQJK+rKkAAoJEDBVD3hx
7wuodUcQAKMbG9Rehxz+uZ6fST99cHt5Fjnv9TorY4hQaQK+85ZgiwPaHMHfYM1G
5hcrXI+JFUpz8j40deZuaWuspOdHBHwnHNQril8MqT0CJgtB6HFTo+w/7Lmmui5M
DDMMed39UJl7bF73hV9ywGecxPpeh+dtoVnh0VT16uK2xTvW6ICEZgaPw1xfPUHS
+jxQ7I05X1OWQkPpmhxXJqGclDyO+qx4CJZsOxUAvt2LphHxhZxB3QE5OUdudGKQ
AH6KhC4rpNQdJVMX20SG8PybL/AipN3Y8N/63VkoqVC2heRlaQ69HjsuqIAkIyan
hHnqmJH8Q+TDTbdKZvOQv6jcd4o3VSibz0T9MwnOfqQ0uRYyTpaXC0vLUH6lXaC4
eK+VVWbY8vCAFHR3h80Q61i2me2HU5ly7a/W22dz19zzDNNC5q9MO78uIYkUK78N
Z0wzJrmOxRyhvs5DOSOpNVlXZhffNQM1f42xxG8cUDaIf7pR5jK+xqHV7tIBQE1D
CrD0mt+YQCnngK0i4wQTO7VT/vjypf4A9W+VSsoJJpRhBbngU4pHu9JWqO84/7AA
j5FN8ug15MWysaS+FQ/EqzHmT7BGBWaTPv3yGlHKUjx0w4bPEpbH7y3fwHAcmOFf
xFRzvZFQ03zeer06yAqTVNuwr77HZgrCzgyQVgIkegAg6iUPiZcs
=CBT+
-----END PGP SIGNATURE-----

12 February 2006

Paul van Tilburg: GNOME Bashing & Early Rising

Early Rising Simon, I have been following the early riser program for 8 months now. I wrote about my experience some time ago. It still is working well for me, although I had a little trouble with it when I moved to Edinburgh. However, my headaches warn me and make sure I stay in the program. GNOME It has been quite a meme to bash GNOME the last few weeks. Since bashing generally doesn’t help the motivation of any hackers and people working hard to accomplish a goal, I want to rebuke it a bit. Two and a half years ago I got my PowerBook. After five years of using a Pentium-233 with 64MB I suddenly had some CPU power. Since my old environment (WindowMaker) was broken on PPC at the moment, I decided to go with GNOME. Coming from WindowMaker it really felt like things were done quite differently, but being open-minded I tried to let it all go. The thing is, do I really want to configure everything? I liked configuring a lot back then, it was like an addiction. Now that I have gotten used to GNOME (let’s call it The GNOME way™), it just feels bothersome, getting in my way. When I encounter a GNOME desktop I immediately can work with it and don’t feel the need to redo the complete setup. When I look back, the only things I really want to change are the programs I use, the information I see and the way I control it all. This basic stuff I easily can do with launchers, applets and keybindings. Also have I been observing and helping some new users. And somehow I still keep reaching the moment that I can hear myself say, “Ow, hehe, yeah… that doesn’t work quite right yet. Just open a terminal (eek!) and edit this and that file.” So, we are not there yet. For me GNOME offers innovation, sufficient configurability, and an efficient do-not-get-in-my-way workspace. The change from WindowMaker to GNOME has been to me as my switch from Windows to Linux, a thing I ventured into open-minded and worked out quite well. Won’t this result into a dull monoculture of desktop setups? Maybe so, but I don’t have to be different in every way just to make the statement that I am different. OK guys, bash away now… oh, and thanks GNOME people!