Hello
Config::Model::Dpkg project (a Debian source package model based on
Config::Model) is partly based on a
ParseRec::Descent grammar. This grammar is used to parse the
dependency of a
Debian source package.
This article will show how such a grammar is written, its limitation regarding error handling and how to improve the situation.
Debian package main data is described in
debian/control file. This file can feature a list of dependencies, i.e. a list of package that must be installed for the package to work. These dependencies are declared in fields like Build-Depends , or Depends as a list of package. For
Dpkg model purpose, I needed only to parse one item of a dependency list at a time.
This dependency item can be a simple package name:
foo
or a package name with a version requirement:
foo ( > 1.24 )
or a package name with architectures restrictions:
foo [alpha amd64 hurd-arm linux-armeb]
or both:
foo ( > 1.24 ) [alpha amd64 hurd-arm linux-armeb]
or a list of alternate choices combining the possibilities above:
foo ( > 1.24 ) bar [ linux-any] baz ( << 3.14 ) [ ! hurd-armel !hurd-armeb ]
or a variable that is replaced during package build:
$ perl-depends
Writing a Parse::RecDescent grammar to parse this is relatively straightforward.
The first production handles alternate dependencies separated by and raises an error if some text was not consumed by the dependencies:
dependency_item: depend(s /\ /) eofile
A dependency as explained above is expressed as:
depend: pkg_dep variable
A variable like
$ foo or
$ bar -1.24~ is parsed with:
variable: /\$ [\w:\-]+ [\w\.\-~+]*/
This rule handles a package name with optional version or arch restriction:
pkg_dep: pkg_name dep_version(?) arch_restriction(?)
pkg_name: /[a-z0-9][a-z0-9\+\-\.]+/
The remaining rules are quite simple:
dep_version: '(' oper version ')'
oper: '<<' '<=' '=' '>=' '>>'
version: variable /[\w\.\-~:+]+/
arch_restriction: '[' arch(s) ']'
arch: /!?[\w-]+/
eofile: /^\Z/
The grammar above works well to parse the dependency. You can test it with this small Perl script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010 ;
use Parse::RecDescent ;
my $parser = Parse::RecDescent->new(join('',));
my $dep = shift ;
say "parsing '$dep'";
my $ret = $parser->dependency_item($dep) ;
say "result is ", $ret if $ref ;
__DATA__
# insert grammar here !!!
Unfortunately, any error in the optional parts (i.e version requirements and arch restriction) leads to an error message which is not very helpful. The error message only mention that some text could not be parsed:
parsing 'foo ( != 1.24 ) bar'
ERROR (line 1): Invalid dependency item: Was expecting /\ / but found
"( != 1.24 ) bar" instead
or
parsing 'foo [ arm & armel] bar'
ERROR (line 1): Invalid dependency item: Was expecting /\ / but found
"[ arm & armel] bar" instead
The problem comes from the fact that version requirements or arch restrictions are optional. For instance if a version requirement has a syntax error, Parse::RecDescent will try to parse it as an arch restriction. This arch restriction rule will also fail and the last terminal ( eofile ) will fail. So the error message does not hint at the actual syntax problem.
To generate better error messages, I improved the
suggestion made in Parse::RecDescent FAQ.
Instead of calling a plain subroutine, I use a sub reference that will store the error messages in a closure. This sub ref is declared in a
start-up action. Note that the sub ref explicitly returns undef. I ll explain why later.
my @dep_errors ;
my $add_error = sub
my ($err, $txt) = @_ ;
push @dep_errors, "$err: '$txt'" ;
return ;
;
The following production always fails while ensuring that the error list is reset. This production is always run at the beginning of the dependency parsing:
dependency: @dep_errors = ();
Here s the actual dependency production that is run when dependency method is called on the parser. It will return an array ref containing (1, data) if the dependency is valid or (0, errors) otherwise:
dependency: depend(s /\ /) eofile
$return = [ 1 , @ $item[1] ] ;
push( @dep_errors, "Cannot parse: '$text'" ) unless @dep_errors ;
$return = [ 0, @dep_errors ];
The following productions don t change much:
depend: pkg_dep variable
variable: /\$ [\w:\-]+ [\w\.\-~+]*/
pkg_dep: pkg_name dep_version(?) arch_restriction(?)
dep_version: '(' oper version ')'
The first rule of this production parses the package name which must be followed by a space, end of string ( or ['. A positive look-ahead assertion is used so only the package name is consumed. If the first rule fails, the second rule provides a meaningful error message. The second rule will match anything which is not a space and create an error message. Since $add_error returns undef, the second rule returns undef and the production fails. So the text stored in the error message is not consumed:
pkg_name: /[a-z0-9][a-z0-9\+\-\.]+(?=\s \Z \( \[)/
/\S+/ $add_error->("bad package name", $item[1]) ;
The same trick is used with these productions:
oper: '<<' '=' '>>'
/\S+/ $add_error->("bad dependency version operator", $item[1]) ;
version: variable /[\w\.\-~:+]+(?=\s \) \Z)/
/\S+/ $add_error->("bad dependency version", $item[1]) ;
The action of this production is a little bit more tricky. The action ensures that '!' are either added before all arch or not at all. Otherwise an error message is generated and added to the list of errors:
arch_restriction: '[' osarch(s) ']'
my $mismatch = 0;
# $ref contains ['!',os,arch] or ['',os,arch]
my $ref = $item[2] ;
for (my $i = 0; $i < $#$ref -1 ; $i++ )
$mismatch = ($ref->[$i][0] xor $ref->[$i+1][0]) ;
my @a = map ($_->[0] '') . ($_->[1] '') . $_->[2] @$ref ;
if ($mismatch)
$add_error->("some names are prepended with '!' while others aren't.", "@a") ;
else
$return = 1 ;
The check above is possible only if the "
osarch" production returns an array ref containing something like
('!','linux','any') for "
!linux-any' or
('','linux','any') for "
linux-any":
osarch: not(?) os(?) arch
$return = [ $item[1][0], $item[2][0], $item[3] ];
/.?(?=\s \] \Z)/ $add_error->("bad arch specification: ", $item[1]) ;
not: '!'
Here's the remaining of the grammar:
os: /(any uclibc-linux linux kfreebsd knetbsd etc...)-/
/\w+/ '-' $add_error->("bad os in architecture specification", $item[1]) ;
arch: / (any alpha amd64 arm\b arm64 etc... )
(?=(\] ))
/x
/\w+/ $add_error->("bad arch in architecture specification", $item[1]) ;
eofile: /^\Z/
That's all for grammar
2.0...
Before someone yells: "Show me the message ! ", here are some example of bad dependencies and their error message generated by the parser:
parse 'foo ( != 1.24 ) bar'
result is: 0 bad dependency version operator: '!='
parsing 'foo [ arm & armel] bar'
result is: 0 bad arch specification: : '&'
parsing 'foo [ arm armel ] bar [!moo]'
result is: 0 bad arch specification: : ']' bad arch in architecture specification: 'moo'
The 2 first error messages are spot on the actual error. The second one has a false positive (']' is correct) but correctly highlights the wrong arch name ('moo').
Mission accomplished.
In order to keep this post (relatively) simple, I've removed the part that actually store parsed data. They don't really matter for error handling. Nevertheless, you may see the
whole grammar in Config::Model::Dpkg::Dependency module.
All the best
Tagged:
Config::Model,
debian,
dpkg,
package,
Parse::RecDescent,
Perl