Peter
Peter Campbell Smith

Santa’s letters

Weekly challenge 247 — 11 December 2023

Week 247 - 11 Dec 2023

Task 2

Task — Most frequent letter pair

You are given a string S of lower case letters 'a'..'z'. Write a script that finds the pair of consecutive letters in S that appears most frequently. If there is more than one such pair, chose the one that is the lexicographically first.

Examples


Example 1
Input: $s = 'abcdbca'
Output: 'bc'

'bc' appears twice in `$s`

Example 2
Input: $s = 'cdeabeabfcdfabgcd'
Output: 'ab'

'ab' and 'cd' both appear three times in $s and 'ab' is 
lexicographically smaller than 'cd'.

Analysis

This task wasn't so mind-bending as task 1, especially when I dredged a little trick from the recesses of my aging memory. I looped over all 25 possible pairs of consecutive letters - ab to yz - and counted their frequency like this:

$count = () = $string =~ m|$pair|g

This works because $string =~ m|$pair|g returns an array of 1's - one for each match of $pair - which is assigned to an anonymous initially empty array (). That array is then assigned in scalar context to $count which results in the count of members of (), which is the number of occurences of $pair in $string, as desired.

Then it's just a case of keeping track of which $count is the (first) maximum.

Note that this method will cope with overlaps liike ababcbcbc.

Try it 

Try running the script with any input:



example: abcbcdefefefg

Script


#!/usr/bin/perl

use v5.26;    # The Weekly Challenge - 2023-12-11
use utf8;     # Week 247 task 2 - Most frequent letter pair
use strict;   # Peter Campbell Smith
use warnings; # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

most_frequent('cdeabeabfcdfabgcd');
most_frequent('abcdefghijklmnopqrstuvwxyz');
most_frequent('nowisthetimeforallgoodmentocometotheaidoftheparty');
most_frequent('zyxwvutsrqponmlkjihgfedcba');
most_frequent('ghghijijij');

sub most_frequent {
    
    my ($x, $string, $pair, $count, $best, $most);
    
    # initialise
    $string = $_[0];
    $best = '';
    $most = 0;
    
    # loop over ab, bc, cd ...
    for $x ('a' .. 'y') {
        $pair = $x . chr(ord($x) + 1);
        
        # count matches in string
        $count = () = $string =~ m|$pair|g;
        if ($count > $most) {
            $best = $pair;
            $most = $count;
        }
    }
    
    # output results
    say qq[\nInput:  \$s = '$string'\nOutput: ] . ($most ? qq['$best' x $most] : 'none');
}

Output


Input:  $s = 'cdeabeabfcdfabgcd'
Output: 'ab' x 3

Input:  $s = 'abcdefghijklmnopqrstuvwxyz'
Output: 'ab' x 1

Input:  $s = 'nowisthetimeforallgoodmentocometotheaidoftheparty'
Output: 'ef' x 1

Input:  $s = 'zyxwvutsrqponmlkjihgfedcba'
Output: none

Input:  $s = 'ghghijijij'
Output: 'ij' x 3