Uniquely ranked

Weekly challenge 260 — 11 March 2024

Week 260 - 11 Mar 2024

Task 2

You are given a word, $word. Write a script to compute the dictionary rank of the given word.

Example 1Input: $word = 'CAT' Output: 3 All possible combinations of the letters: CAT, CTA, ATC, TCA, ACT, TAC Arrange them in alphabetical order: ACT, ATC, CAT, CTA, TAC, TCA CAT is the 3rd in the list. Therefore the dictionary rank of CAT is 3.Example 2Input: $word = 'GOOGLE' Output: 88Example 3Input: $word = 'SECRET' Output: 255

This is an excellent challenge: easy to state and understand but not so easy to solve.

The first solution that occurred to me was:

- Generate every permutation of the letters of
`$word`

- Eliminate duplicates (eg SECRET and SECRET) are the same word)
- Sort the remaining words
- Count down until we find
`$word`

That can be speeded up somewhat by using a permutation generator such as
Algorithm::Combinatorics, which delivers the permutations one by one and does
so in lexicographic order. So using that, we can stop when it returns
`$word`

. That's the solution I have submitted.

However, if `$word`

is long, say more than 10 characters, this is
quite slow because there are (typically) a lot of permutations to generate and count.
I have added the time taken (on my Raspberry Pi), and you can see that a 10-letter word took 7 seconds, but an 11-letter one took over 2 minutes.

So is there a better method?

We could try computing, rather than counting, the number of permutations that would precede
`$word`

in the ordered list, and as an example suppose the word we are given is MARE.
Clearly any permutation that precedes MARE will have to start with A, E or possibly
M, but we'll come to that in a minute.

Valid words starting with A will be followed by M, R and E in any order: and the number of such permutations of 3 characters is 3!, ie 3 x 2 x 1 = 6, and they are AEMR, AERM, AMER, AMRE, AREM and ARME. Similarly, there will be 6 starting with E, so there are 12 words that will precede the first Mxxx word in an alphabetical list. That's the easy bit.

There may be some permutations that start with M but are alphabetically before MARE. There is no immediately obvious way to compute how many of those there are, although in the case of MARE there is only one, and that's MAER. So we have 13 words that precede MARE, and MARE is therefore the 14th, which happily agrees with the brute force counting method.

But in the general case, a way to compute the number of words starting with M but preceding MARE alphabetically does not spring to my mind.

And there is another issue. If there are repeated letters in `$word`

,
in words such as SECRET or WOOLLEN, then there will be duplicated permutations, and again, I
cannot see an easy way to compute how many there are.

I didn't pursue that line of thought, but maybe someone else has done so.

#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2024-03-11 use utf8; # Week 260 - task 2 - Dictionary rank use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Algorithm::Combinatorics 'permutations'; use Time::HiRes 'time'; dictionary_rank('CAT'); dictionary_rank('GOOGLE');; dictionary_rank('SECRET'); dictionary_rank('MARE'); dictionary_rank('ZYMURGY'); dictionary_rank('PENGUINS'); dictionary_rank('CAMBRIDGE'); dictionary_rank('DICTIONARY'); dictionary_rank('FABRICATING'); sub dictionary_rank { my ($word, @letters, $iter, $test, $count, $this, %seen, $start); # initialise $start = time; $word = uc($_[0]); @letters = split('', $word); @letters = sort @letters; # iterate over permutations in lexicographic order $iter = permutations(\@letters); while ($test = $iter->next) { $this = join('', @$test); # eliminate duplicates next if $seen{$this}; $seen{$this} = 1; # count them until we find $word $count ++; last if $this eq $word; } say qq[\nInput: \$word = '$word']; say sprintf(qq[Output: $count (%.2f seconds)], time - $start); }

Input: $word = 'CAT' Output: 3 (0.00 seconds) Input: $word = 'GOOGLE' Output: 88 (0.00 seconds) Input: $word = 'SECRET' Output: 255 (0.00 seconds) Input: $word = 'MARE' Output: 14 (0.00 seconds) Input: $word = 'ZYMURGY' Output: 2439 (0.03 seconds) Input: $word = 'PENGUINS' Output: 12739 (0.19 seconds) Input: $word = 'CAMBRIDGE' Output: 84356 (0.74 seconds) Input: $word = 'DICTIONARY' Output: 412015 (6.95 seconds) Input: $word = 'FABRICATING' Output: 3665302 (124.26 seconds)

Peter Campbell Smith is hereby placed in the public domain