Tuesday, December 12, 2006

Shell Tasks? In Haskell?

No. Your eyes are not deceiving you: I am in fact suggesting that Haskell is suitable for tasks that are normally relegated to shell scripts.

Recently, I was asked by a colleague to come up with a simple shell script to rename some files. Basically, the files were being moved from a windows machine to a *nix environment, which meant that case sensitivity was going to become an issue. The request was simple enough, we needed to rename all the specified items to lowercase names

You might be thinking, ``classic tr territory here.'' And, of course, you'd be absolutely correct in going in this direction. The important bits go something like this:

for file in ${*}; do
  downcase=$(echo ${file} | tr 'A-Z' 'a-z')
  if ! [[ "${downcase}" == "${file}" ]]; fi
    echo "Moving ${file} to ${downcase}"
    mv ${file} ${downcase}

So what's wrong with that?

Absolutely nothing. It's both simple and effective. So why bother writing it in any other language, least of all Haskell? Because it's not beautiful.

The great thing about the above code is that you don't really need to put it into a script. You can just bang away at it from the command line -- yes, you can type loops on the command line, *smirks*.

When at the command line, however, you just do the simplest thing that works: you aren't worried about wrapping the variable names in curly brackets, and so on. IF you do want to keep scripts like this around though, after a while you'll start to see them grow ugly. Nobody likes ugly code, so lets see how it looks if we port it to Haskell.

First lets start by importing a few functions:

module Main (main) where
import System(getArgs) -- so we can get the file names
import Data.Char(toLower) -- will do 'tr's work
import Directory(renameFile) -- so we can do the renames
import Control.Monad -- helps us quit on bad params

Those will definitely come in handy later. Okay, so let's talk about the data for this ``application.'' Obviously we're going to have a bunch of filepaths, but we also need to keep track of the downcased names as well. We'll gather these together with the original names in pairs. Since there will be multiple such pairings, they will be gathered together into a list, which looks like this: [(String,String)].

There are two actions to be done for each pair,

  1. Each pair of directories should be printed to the screen, and
  2. The file should be moved from the first name to the second.

Let's define a function to print let the user know what's going on. This is easy enough, and as the type of this function suggests we take a pair and do some IO:

putDirs :: (String,String) -> IO ()
putDirs pair = putStrLn $ "Moving: " ++ fst pair ++ " " ++ snd pair

The second action is so simple that we won't even make a function for it. It's by-and-large already done for us -- remember the import from the Directory module?

One other thing the original bash version did is check that the file actually has to be moved. This is simple in Haskell. We'll use the higher order function filter to remove the pairs where the two elements are already equal:

rmDups = filter (uncurry (/=))

Here uncurry takes a function and returns a function which acts on a pair.

Now all that is left is to do the main part. Check the parameters and loop our actions over each of the files

main = do
 files <- getArgs
 let usage = "Usage: dcfiles [files]"
 when (length files < 1) $
   putStrLn ("Argument Error: no arguments specified\n" ++ usage)
   >> return ()
 let pairs = (rmDups.zip files) $ map (map toLower) files
 mapM_ (liftM2 (>>) putDirs $ uncurry renameFile) pairs

I think all of this reads very well. The only tricky part is the last two lines. A lot of stuff is going on in a little amount of space, but it's still not too hard. toLower takes a character and returns it's lowercase equivalent, so this just get's mapped over the entire string. We do this for each of the file names passed in. Next we use zip to form our pairs from the original list and the mapping, and finally we filter it using our removeDups function.

The ultimate line is just a fancy way of saying that we need to map our two actions one after the other. The mapM_ construct just says that we'll be mapping actions rather than regular functions.

So, now that that's done, what have we gained? We'll if we never added onto the script, it probably would have been just as well to leave it alone. However, if we wanted to make it a little more robust it would be very easy to do and you'd have all the power of Haskell at your disposal.

What I wanted to show here is that Haskell needn't be reserved for massively complex projects. It doesn't get any more simple than this. Don't fear Haskell, love it :)

No comments: