Writing a Haskell version of Peter Norvig's spelling corrector is now considered as a standard Haskell exercise. There has been several solutions, of Bryan O'Sullivan, Tim Robinson, or Marco Sero.

My solution gives a slightly different perspective. The emphasis is on

• Writing type classes
• Polymorphic functions

The type class and monad ideas are already present in Norvig's code! Indeed, list comprehension in Python is essentially using the monadic structure of lists. The type class (admittedly here completely overkill) simulates the or function in Python.

Finally, the code is written with polymorphic functions. The reason is that there is less possible mistakes when writing such a code. In most of what follows, the code works not only for strings, but for lists of any type.

The code is on GitHub.

Design choices

Some comments about some design choices. First, I don't think that sets are necessary here. Everything is much simpler with lists, although we'll have to carry around duplicates. This is a price to pay, the alternative being the costly procedure of eliminating all duplicates.

The other choice is to use the module Data.Text instead of Data.ByteString, as this seems to be more suitable for, well, text. This is only relevant in the construction of the dictionary, as the rest of the code, being polymorphic, has nothing to do with strings.

The code

At this point, you should take another look at Norvig's Python implementation, to understand the basic principle of how this all works.

The editors

First, prepare all the “editors”:

transpose :: [a] -> [a]
transpose [] = []
transpose [_] = []
transpose (a:b:xs) = b:a:xs

delete :: [a] -> [a]
delete [] = []
delete (_:xs) = xs

insert :: [a] -> [a] -> [[a]]
insert letters word =
do -- List
l <- letters
return (l:word)

replace :: [a] -> [a] -> [[a]]
replace _ [] = []
replace letters (_:xs) = insert letters xs


Gather all the editors

allEditors :: [a] -> [[a] -> [[a]]]
allEditors letters =  [return . transpose, replace letters, return . delete, insert letters]


Gather all the possible editing (one edit)

We need to define all the possible splits of a word:

splits :: [a] -> [([a],[a])]
splits word = zip (inits word) (tails word)


Note how this method avoids using splitAt and a length calculation, which is error-prone.

Finally, we can define the editOnce function:

editsOnceWith :: [[a]->[[a]]] -> [a] -> [[a]]
editsOnceWith editors word = do -- List
(begin,end) <- splits word
editor <- editors
endedit <- editor end
return (begin ++ endedit)


I really like the do notation for the list monad, but you could also write it in a big list comprehension:

[begin ++ endedit |
(begin,end) <- splits word,
editor <- editors,
endedit <- editor end]


Read from the dictionary of known words

We are now ready to gather all the known elements from the dictionary of known words:

inDict :: (Eq k, Ord k) => M.Map k v -> [k] -> [(k,v)]
inDict dict = mapMaybe myLookup
where
myLookup w = -- returns either Nothing or Just (w,f) if w is found in dict (f is the frequency)
do -- Maybe
f <- M.lookup w dict
return (w,f)


Python's or function

Before building the list of known edited words, we implement a Haskell version of Python's or function.

We need to define what it means to be “Bool-like”:

class BoolLike a where
falthy :: a -> Bool
bempty :: a


We are going to use that a list is “truthy“ if it is not empty:

instance BoolLike [a] where
bempty = []
falthy = null


Now we define a monoid structure which selects the first truthy element:

newtype First a = MkFirst { getFirst :: a } deriving (Show, Eq)

instance BoolLike a => Monoid (First a) where
mempty = MkFirst bempty
mappend (MkFirst l) (MkFirst r) = MkFirst (if falthy l then r else l)


Notice how editOnce has (essentially) the type signature

[a] -> [[a]]


It means that in order to edit a word twice, we need to compose that function with itself. This is a textbook example of a monadic composition (here with the list monad). Monadic (or Kleisli) composition is obtained with (<=<) in the following code snippet.

Minimal number of edits to get some known words

We now use the First monoid structure to select the first group of known words which is not empty:

allChoices :: ([a] -> [(a,b)]) -> (a -> [a]) -> a -> [(a,b)]
allChoices inDict' edits1 word = getFirst possibilities
where
possibilities =
mkFirst return
<>  mkFirst edits1
<>  mkFirst (edits1 <=< edits1)
mkFirst edit = (MkFirst . inDict' . edit) word


Choose the best candidate

In order to choose the best candidate, we need a function which is not in the Haskell library, so we implement it! It is simply a variant of maximumBy which works on empty lists as well. In order to work, such a function needs a default value. The implementation is straightforward:

maxByOrDefault :: (Foldable t) => (a -> a -> Ordering) -> a -> t a -> a
maxByOrDefault comp = foldl' (\ e e' -> if comp e' e == GT then e' else e)


And we now choose the best of all the candidates:

chooseBest :: (Ord k, Ord v, Num v) => k -> [(k,v)] -> k
chooseBest nothing choices' = fst bestPair
where bestPair = maxByOrDefault (comparing snd) (nothing, 0) choices'


Piecing all together

Piecing all together (and specializing to the case of strings) is now very simple:

alphabet :: String
alphabet = ['a' .. 'z']

correct :: TrainingDict -> String -> String -> String
correct dict notfound word = chooseBest notfound choices
where
choices = allChoices (inDict dict) (editsOnceWith (allEditors alphabet)) word


The training dictionary

We need to creating a big hash map from a text file. I have hardly changed anything from the original code except that I use Text.unpack and Data.Text.IO.readfile to read the file (ByteString does not sound like such a good idea).

nWords :: IO TrainingDict
nWords = do -- IO
fileName <- getDataFileName "big.txt"
return ((train . lowerWords . unpack) ws)

lowerWords :: String -> [String]
lowerWords = words . fmap normalize
where normalize c = if isAlpha c then toLower c else ' '

train :: (Ord k, Num v) => [k] -> M.Map k v
train trainWords = foldl' increment M.empty trainWords
where
increment dict x = M.insertWith (+) x 1 dict


Things to do

One remaining thing to do is to write proper tests, using a dummy dictionary.

Edit: The code now follows the excellent readability advice of Gabriel Gonzalez.