Peter
Peter Campbell Smith

Uniquely ranked

Weekly challenge 260 — 11 March 2024

Week 260 - 11 Mar 2024

Task 2

Task — Dictionary rank

You are given a word, $word. Write a script to compute the dictionary rank of the given word.

Examples


Example 1
Input: $word = 'CAT'
Output: 3

All possible combinations of the letters:
CAT, CTA, ATC, TCA, ACT, TAC

Arrange them in alphabetical order:
ACT, ATC, CAT, CTA, TAC, TCA

CAT is the 3rd in the list.
Therefore the dictionary rank of CAT is 3.

Example 2
Input: $word = 'GOOGLE'
Output: 88

Example 3
Input: $word = 'SECRET'
Output: 255

Analysis

This is an excellent challenge: easy to state and understand but not so easy to solve.

The first solution that occurred to me was:

  • Generate every permutation of the letters of $word
  • Eliminate duplicates (eg SECRET and SECRET) are the same word)
  • Sort the remaining words
  • Count down until we find $word

That can be speeded up somewhat by using a permutation generator such as Algorithm::Combinatorics, which delivers the permutations one by one and does so in lexicographic order. So using that, we can stop when it returns $word. That's the solution I have submitted.

However, if $word is long, say more than 10 characters, this is quite slow because there are (typically) a lot of permutations to generate and count. I have added the time taken (on my Raspberry Pi), and you can see that a 10-letter word took 7 seconds, but an 11-letter one took over 2 minutes.

So is there a better method?

We could try computing, rather than counting, the number of permutations that would precede $word in the ordered list, and as an example suppose the word we are given is MARE. Clearly any permutation that precedes MARE will have to start with A, E or possibly M, but we'll come to that in a minute.

Valid words starting with A will be followed by M, R and E in any order: and the number of such permutations of 3 characters is 3!, ie 3 x 2 x 1 = 6, and they are AEMR, AERM, AMER, AMRE, AREM and ARME. Similarly, there will be 6 starting with E, so there are 12 words that will precede the first Mxxx word in an alphabetical list. That's the easy bit.

There may be some permutations that start with M but are alphabetically before MARE. There is no immediately obvious way to compute how many of those there are, although in the case of MARE there is only one, and that's MAER. So we have 13 words that precede MARE, and MARE is therefore the 14th, which happily agrees with the brute force counting method.

But in the general case, a way to compute the number of words starting with M but preceding MARE alphabetically does not spring to my mind.

And there is another issue. If there are repeated letters in $word, in words such as SECRET or WOOLLEN, then there will be duplicated permutations, and again, I cannot see an easy way to compute how many there are.

I didn't pursue that line of thought, but maybe someone else has done so.

Try it 

Try running the script with any input:



example: CHALLENGE

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2024-03-11
use utf8;     # Week 260 - task 2 - Dictionary rank
use warnings; # Peter Campbell Smith
binmode STDOUT, ':utf8';

use Algorithm::Combinatorics 'permutations';
use Time::HiRes 'time';

dictionary_rank('CAT');
dictionary_rank('GOOGLE');;
dictionary_rank('SECRET');
dictionary_rank('MARE');
dictionary_rank('ZYMURGY');
dictionary_rank('PENGUINS');
dictionary_rank('CAMBRIDGE');
dictionary_rank('DICTIONARY');
dictionary_rank('FABRICATING');

sub dictionary_rank {
    
    my ($word, @letters, $iter, $test, $count, $this, %seen, $start);
    
    # initialise
    $start = time;
    $word = uc($_[0]);
    @letters = split('', $word);
    @letters = sort @letters;
    
    # iterate over permutations in lexicographic order
    $iter = permutations(\@letters);
    while ($test = $iter->next) {
        $this = join('', @$test);
        
        # eliminate duplicates
        next if $seen{$this};
        $seen{$this} = 1;
        
        # count them until we find $word
        $count ++;
        last if $this eq $word;
    }
    
    say qq[\nInput:  \$word = '$word'];
    say sprintf(qq[Output: $count (%.2f seconds)], time - $start);
}

Output


Input:  $word = 'CAT'
Output: 3 (0.00 seconds)

Input:  $word = 'GOOGLE'
Output: 88 (0.00 seconds)

Input:  $word = 'SECRET'
Output: 255 (0.00 seconds)

Input:  $word = 'MARE'
Output: 14 (0.00 seconds)

Input:  $word = 'ZYMURGY'
Output: 2439 (0.03 seconds)

Input:  $word = 'PENGUINS'
Output: 12739 (0.19 seconds)

Input:  $word = 'CAMBRIDGE'
Output: 84356 (0.74 seconds)

Input:  $word = 'DICTIONARY'
Output: 412015 (6.95 seconds)

Input:  $word = 'FABRICATING'
Output: 3665302 (124.26 seconds)