Haskell Code Katas: Counting Duplicates

17 Dec 2017 ~ 2 minute read

For the past few weeks, I’ve been starting off my days with Haskell flavored code katas from Codewars. Today I started with the kata below and figured it would be a good exercise to walk through my solution.

Write a function that will return the count of distinct case-insensitive alphabetic characters and numeric digits that occur more than once in the input string. The input string can be assumed to contain only alphabets (both uppercase and lowercase) and numeric digits.

To help clarify the specifications for this kata, the Hspec test suite is below:

module Codwars.Kata.Duplicates.Test where

import Codwars.Kata.Duplicates (duplicateCount)
import Data.List (nub)
import Test.Hspec
import Test.QuickCheck

main = hspec $ do
  describe "duplicateCount" $ do
    it "should work for some small tests" $ do
      duplicateCount ""                         =?= 0
      duplicateCount "abcde"                    =?= 0
      duplicateCount "aabbcde"                  =?= 2
      duplicateCount "aaBbcde"                  =?= 2
      duplicateCount "Indivisibility"           =?= 1
      duplicateCount "Indivisibilities"         =?= 2
      duplicateCount ['a'..'z']                 =?= 0
      duplicateCount (['a'..'z'] ++ ['A'..'Z']) =?= 26
    it "should work for some random lists" $ do
      property $ forAll (listOf $ elements ['a'..'z']) $ \x ->
        let xs = nub x
        in duplicateCount (concatMap (replicate 2) xs) =?= length xs
  where (=?=) = shouldBe

Sorting & Grouping

To start things off, we are given the following snippet:

module Codwars.Kata.Duplicates where

duplicateCount :: String -> Int
duplicateCount = undefined

My first step is to figure out how to deal with case-insensitivity. Within Data.Char is toLower, which can be used to map over each character in the input String.

Prelude> x = "aaBbcde"
Prelude> x
"aaBbcde"
Prelude> import Data.Char
Prelude Data.Char> :t toLower
toLower :: Char -> Char
Prelude Data.Char> map toLower x
"aabbcde"

Next, I want to group like characters together. To do that, I need to sort and then group the characters together.

Prelude Data.Char> import Data.List
Prelude Data.Char Data.List> :t sort
sort :: Ord a => [a] -> [a]
Prelude Data.Char Data.List> sort . map toLower $ x
"aabbcde"

The sort doesn’t do very much in this case because the input string was already sorted. Either way, now we can work on grouping like characters with group:

Prelude Data.Char Data.List> :t group
group :: Eq a => [a] -> [[a]]
Prelude Data.Char Data.List> group . sort . map toLower $ x
["aa","bb","c","d","e"]

Home Stretch

Now, how do we go from a list of [Char] to an Int length that can be used for filtering characters that only occur once? filter, with a >1 condition applied to the length, should get us there.

Prelude Data.Char Data.List> z = group . sort . map toLower $ x
Prelude Data.Char Data.List> filter ((>1) . length) z
["aa","bb"]

Here, the . allows us to compose length and >1 together so that both can be applied to the [Char] provided to filter. The result rids the list of any characters that only occur once in the original input.

Lastly, we need the count of distinct characters from the input String that occur more than one, which is as simple as getting the length of the filtered list.

Prelude Data.Char Data.List> f = filter ((>1) . length) z
Prelude Data.Char Data.List> length f
2

Putting it all together, and breaking out some of the pipelined functions into a variable in the where clause, we get the duplicateCount function below.

module Codwars.Kata.Duplicates where

import Data.List (group, sort)
import Data.Char (toLower)

duplicateCount :: String -> Int
duplicateCount = length . filter ((>1) . length) . grouped
  where
    grouped = group . sort . map toLower