
For a while I have been pondering over a problem that arises when your functionally written program has some state with cross references for example a list of users, each of which uses a number of computers, and a list of computers, each having an owner.
Implicit referencing
For doing queries on such data, it would be convenient if every reference is just the referenced object itself. Although we would visualize this as a graph, semantically, it is more like an infinite tree. This is possible in Haskell, due to laziness, and if you create the data structure cleverly, it even uses constant memory, no matter how deep you enter this infinite tree (
a recent post of mine talks about this). A possible data definition would be:
data User = User
userName :: String,
uses :: [Computer]
data Computer = Computer
computerName :: String,
owner :: User -- references the Users
data State = State [User] [Computer]
testState = let user = User "Conrad" [computer]
computer = Computer "Z3" user
in State [user] [computer]
Explicit referencing
But such a representation is very unsuitable for updates (at least I can t think if a nice way of updating such a graph without breaking the internal cross-references) and serialization, which is a must for a
HAppS based application. So what one would probably start with is this data structure:
data User = User
userId :: Int,
userName :: String,
uses :: [Int] -- references the Computers
data Computer = Computer
computerId :: Int,
computerName :: String,
owner :: Int -- references the Users
data State = State [User] [Computer]
testState = State
[User 0 "Conrad" [1]]
[Computer 1 "Z3" 0]
I think the semantics of this are clear. Note that the referencing is currently not type-safe, but this can be provided by phantom types. Maybe I ll write more about that later.
Now imaging you want to display the information about the first computer with your web application. You extract the information with
let State _ cs = testState in head cs and pass that to your templating engine. But what if your template wants to display the name of the owner? It only has access to his
userId. You would either need to know what information the template will ever need, extract that from the state beforehand and pass it along, or give the template access to the whole state. In that case, though, there has to be lookup-logic in your output code, which is also not nice.
Woudln t it be nice if you could, in your application logic, work with the explicit references, which are easy to modify and store, but somehow turn that into the implicit referencing?
Duplicated representation
One way would be to have two unrelated sets of data structures,
ExplicitState,
ExplicitUser,
ExplicitComputer, which use explicit identifiers to reference each other, and
ImplicitState,... which are defined as the first representation of our state. It is then mostly trivial to write a function that converts
ExplicitState to
ImplicitState.
The big disadvantage of this is that you have to maintain these two different hierarchies. It also means that every function on the state has to be defined twice, which often almost identical code. Clearly, this is not desirable.
Annotated representation
It would be more elegant if the state is stored in one data type that, controlled by a type parameter, comes in the one or the other representation. To do that, we need two types: One that contains a value, and one that contains just a reference:
newtype Id v = Id v deriving (Show, Typeable, Data)
newtype Ref v = Ref Int deriving (Show, Typeable, Data)
Then we need to adjust our data definitions, to make use of these. (I ll leave out the names, to keep the code smaller)
data User ref = User
userId :: Int,
uses :: [ref (Computer ref)]
data Computer ref = Computer
computerId :: Int,
owner :: ref (User ref)
data State ref = State [User ref] [Computer ref]
Here we introduce a type parameter ref , which will later be either
Id or
Ref. Note that now a reference also states the object it is a reference for, which greatly increases type safety. Functions on these data types that don t work with the references will be polymorphic in the ref type parameter, so only need to be written once. A
User Id is a complete user with all related data, while
User Ref is a user with only references. And a
Ref (User Ref) is reference to a user, which contains references...
Not so kind kinds
Did you notice the lack of a deriving clause? Our data structures have the relatively peculiar kind (
(* -> *) -> *), which makes it hard for the compiler to derive instances for things like
Show. But we already know that we will only use
Id or
Ref for the type variable, so we can use ghc s
StandaloneDeriving language extension and have these instances created:
deriving instance Show (User Id)
deriving instance Show (User Ref)
deriving instance Show (Computer Id)
deriving instance Show (Computer Ref)
deriving instance Show (State Id)
deriving instance Show (State Ref)
Toggling a type parameter
The next step is to write the conversion function. It will have type
unrefState :: State Ref -> State Id
For that, and for later, we need lookup functions:
unrefUserRef :: State Id -> Ref (User Ref) -> Id (User Id)
unrefUserRef (State l _) (Ref i) = Id $ fromJust $
find (\u@(User i' _) -> i == i') l
unrefComputerRef :: State Id -> Ref (Computer Ref) -> Id (Computer Id)
unrefComputerRef (State _ l) (Ref i) = Id $ fromJust $
find (\u@(Computer i' _) -> i == i') l
These expect a State (with implicit referencing) and a reference, and look up this reference. The function
unrefState then looks like this:
unrefState :: State Ref -> State Id
unrefState (State us cs) =
let unrefState = State (map (unrefUser unrefState) us)
(map (unrefComp unrefState) cs)
in unrefState
where unrefUser :: State Id -> User Ref -> User Id
unrefUser s (User i refs) = User i (map (unrefComputerRef s) refs)
unrefComp :: State Id -> Computer Ref -> Computer Id
unrefComp s (Computer i ref) = Computer i (unrefUserRef s ref)
Note how we tie the knot in the
let expression. This is the trick that ensures constant memory consumption, because every reference points back to the same place in memory.
Satisfied already?
So what do we have? We have no duplication of data types, we can write general functions, and we can resolve the explicit referencing. We can also easily write functions like
unrefUser :: State Ref -> User Ref -> User Id, which transform just a part of the state.
But writing
unrefState is very tedious when the State becomes more complex. Each of the other
unrefSomething functions are very similar, but need to be written anyways. This is unsatisfactory. What we want, is a generic function, something like
gunref :: State Ref -> a Ref -> a Id
which, given a state with explicit references, replaces all explicit references in the first argument (which could be
State Ref or
User Ref or anything like that) with implicit ones. This function can not exist, because we would not know anything about
a and
b. But maybe we can do this:
gunref :: (Data (a Id), Data (a Ref)) => State Ref -> a Ref -> a Id
Typeable and Data
Before being able to do so, we need to derive
Data for our types. We can start with
deriving instance Data (User Id)
deriving instance Data (User Ref)
deriving instance Data (Computer Id)
deriving instance Data (Computer Ref)
deriving instance Data (State Id)
deriving instance Data (State Ref
but that will complain about missing
Typeable instances. Unfortunately, ghc s deriver for
Typeable (even the stand-alone-one), does not handle our peculiar kind, so we need to do it by hand. With some help from quicksilver on #haskell, I got it to work:
instance Typeable1 ref => Typeable (User ref) where
typeOf _ = mkTyConApp (mkTyCon "User") [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (Computer ref) where
typeOf _ = mkTyConApp (mkTyCon "Computer") [typeOf1 (undefined :: ref ())]
instance Typeable1 ref => Typeable (State ref) where
typeOf _ = mkTyConApp (mkTyCon "State") [typeOf1 (undefined :: ref ())]
everywhere is not enough
Turning to the documentation of
Data.Generics, I notice with some disappointment that there is no function that is able to
change a type they all seem to replace a value by another value of the same type. But the functions
gfoldl and
gunfold sounded like they could be used for this.
Warning: What comes now is a very non-haskellish hack that subverts the type system, just to get the job done. Please read it with a healthy portion of distrust. If you know of a cleaner way of doing that, please tell me!
Wrapped Data
I want to do some heavy type hackery, so I need to disable haskell s type system. There is
Data.Dynamic, but not even that is enough, as we need to carry a type s
Data instance around as well. So let s wrap that up:
data AData where AData :: Data a => a -> AData
instance Show AData where show (AData a) = "<" ++ show (typeOf a) ++ ">"
fromADataE :: forall b. Data b => AData -> b
fromADataE (AData d) = case cast d of
Just v -> v
Nothing -> error $ "Type error, trying to convert " ++
show (typeOf d) ++ " to " ++
show (typeOf (undefined :: b)) ++ "."
There is also a function that converts an
AData back to a normal type, if possible. If it s not possible, then there is a bug in our code, so we give an informative error message.
AData transformers
Similar to
everywhere, we want to have transformers that combinable. They need to have the change to convert an arbitrary value, but also signal that they could not convert something (and this something has to be recursed into). I came up with this:
type ADataT = AData -> Maybe AData
extADT :: forall a b. (Data a, Data b) => ADataT -> (a -> b) -> ADataT
extADT at t a@(AData v) = case cast v of
Just x -> Just (AData (t x))
Nothing -> at a
doNothing :: ADataT
doNothing = const Nothing
ADataT is the type for such a transformer.
doNothing will not transform anything, and
extADT can be used to add any function to the list of tried transformers, in the spirit of
extT.
The ugly part
To apply such a transformer, I want to use this function, which I ll describe in the code comments:
everywhereADT :: forall a b. (Data a, Data b) => ADataT -> a -> b
-- first check if we can transform this value already
everywhereADT f v = case f (AData v) of
-- if so, coerce it to the users requested type, which hopefully goes well
Just r -> fromADataE r
-- if not, we need to recurse into the data structure
Nothing -> recursed
-- for that, we first need to figure out the arguments to the
-- constructor. We store them in the untyped list
where args :: [AData]
(Const args) = gfoldl k z v
-- gfoldl lets us have a look at each argument. We wrap it in AData
-- and append it to the list
k (Const args) arg = Const (AData arg : args)
z start = Const []
-- We need the data constructor of the input data type. If the user did not want
-- it to be transformed, it will be the same
c = toConstr v
-- To give better error messages, we make sure that the outmost type constructor
-- of the requested type actually has the data constructor we were given. Otherwise
-- gunfold will complain in a non-helpful way
input_type = dataTypeRep (constrType c)
output_type = dataTypeRep (dataTypeOf (undefined :: b))
recursed = if input_type /= output_type
then error $ "Can not convert <" ++ show input_type ++ ">"++
" to <" ++ show output_type ++ ">."
-- the types match, so we assemble the output type, using gunfold
else snd (gunfold k' z' c)
k' :: forall b r . Data b => ([AData], b -> r) -> ([AData],r)
-- we start by reversing the input list
z' start = (reverse args,start)
-- then we call us recursively on the argument and feed the result
-- to the output constructor
k' ((AData a) : args, append) = (args, append (everywhereADT f a))
-- Used for folding (we don t need to retain the original constructor)
data Const a b = Const a
What a beast! But surprisingly, it works. Here are some examples. Note that I always have to specify the requested output type:
bool2Int :: Bool -> Int
bool2Int False = 0
bool2Int True = 1
*Main> everywhereADT (doNothing extADT bool2Int) True :: Int
1
*Main> everywhereADT (doNothing extADT bool2Int) True :: ()
*** Exception: Type error, trying to convert Int to ().
*Main> everywhereADT (doNothing extADT bool2Int) (True,False) :: (Int,Int)
(1,0)
*Main> everywhereADT (doNothing extADT bool2Int) ([True,False],True,()) :: ([Int],Int,())
([1,0],1,())
*Main> everywhereADT (doNothing extADT bool2Int) ([True,False],True,()) :: [()]
*** Exception: Can not convert to .
*Main> everywhereADT (doNothing extADT bool2Int) [True] :: [Bool]
[*** Exception: Type error, trying to convert Int to Bool.
I hope this code does not inflict too much pain on any Haskell-loving reader. I know I violated the language, but I didn t know how to do it better (at least not without using Template Haskell). I also know that this is not very good performance wide: Every single value in the input will be deconstructed, type-compared several times and re-assembled. If that is an issue, then this function should only be used for prototyping.
Almost done
To apply this to our state, we only need to glue it to our lookup functions from above:
gunref :: (Data (a Id), Data (a Ref)) => State Ref -> a Ref -> a Id
gunref s w = let unrefState = gunref' unrefState s
in gunref' unrefState w
gunref' :: (Data (a Id), Data (a Ref)) => State Id -> a Ref -> a Id
gunref' unrefState = everywhereADT unref'
where unref' = doNothing extADT
unrefUserRef unrefState extADT
unrefComputerRef unrefState
Now we have a generic unreferencer. I set the type a bit more specific than necessary, to make it safe to use (under the assumption that the list of lookup functions is complete and will not leave any
Ref in the output).
*Main> testState
State [User userId = 0, uses = [Ref 1] ] [Computer computerId = 1, owner = Ref 0 ]
*Main> gunref testState testState
State [User userId = 0, uses = [Id (Computer computerId = 1, owner = Id (User userId = 0,
uses = [Id (Computer computerId = 1, owner = Id (User userId = 0, uses = ..
Oh, and by the way, if you want to test this code, you ll need at least:
-# LANGUAGE DeriveDataTypeable, FlexibleInstances, GADTs,
FlexibleContexts, StandaloneDeriving, ScopedTypeVariables #-