Peter’s blog ✴ Week 247 ✴ 11 December 2023

THE WEEKLY CHALLENGE
Santa’s letters

The Perl Camel

Task 2

Most frequent letter pair

You are given a string S of lower case letters 'a'..'z'. Write a script that finds the pair of consecutive letters in S that appears most frequently. If there is more than one such pair, chose the one that is the lexicographically first.

Examples


Example 1
Input: $s = 'abcdbca'
Output: 'bc'

'bc' appears twice in `$s`

Example 2
Input: $s = 'cdeabeabfcdfabgcd'
Output: 'ab'

'ab' and 'cd' both appear three times in $s and 'ab' is 
lexicographically smaller than 'cd'.

Analysis

This task wasn't so mind-bending as task 1, especially when I dredged a little trick from the recesses of my aging memory. I looped over all 25 possible pairs of consecutive letters - ab to yz - and counted their frequency like this:

$count = () = $string =~ m|$pair|g

This works because $string =~ m|$pair|g returns an array of 1's - one for each match of $pair - which is assigned to an anonymous initially empty array (). That array is then assigned in scalar context to $count which results in the count of members of (), which is the number of occurences of $pair in $string, as desired.

Then it's just a case of keeping track of which $count is the (first) maximum.

Note that this method will cope with overlaps liike ababcbcbc.

Perl Weekly’s review

from PW issue 647

Sort hash value in Perl is the core. Also some Perl magic in display. Keep it up great work.

Try it 

Try running the script with any input:



example: abcbcdefefefg

Script


#!/usr/bin/perl

use v5.26;    # The Weekly Challenge - 2023-12-11
use utf8;     # Week 247 task 2 - Most frequent letter pair
use strict;   # Peter Campbell Smith
use warnings; # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

most_frequent('cdeabeabfcdfabgcd');
most_frequent('abcdefghijklmnopqrstuvwxyz');
most_frequent('nowisthetimeforallgoodmentocometotheaidoftheparty');
most_frequent('zyxwvutsrqponmlkjihgfedcba');
most_frequent('ghghijijij');

sub most_frequent {
    
    my ($x, $string, $pair, $count, $best, $most);
    
    # initialise
    $string = $_[0];
    $best = '';
    $most = 0;
    
    # loop over ab, bc, cd ...
    for $x ('a' .. 'y') {
        $pair = $x . chr(ord($x) + 1);
        
        # count matches in string
        $count = () = $string =~ m|$pair|g;
        if ($count > $most) {
            $best = $pair;
            $most = $count;
        }
    }
    
    # output results
    say qq[\nInput:  \$s = '$string'\nOutput: ] . ($most ? qq['$best' x $most] : 'none');
}

12 lines of code

Output from script


Input:  $s = 'cdeabeabfcdfabgcd'
Output: 'ab' x 3

Input:  $s = 'abcdefghijklmnopqrstuvwxyz'
Output: 'ab' x 1

Input:  $s = 'nowisthetimeforallgoodmentocometotheaidoftheparty'
Output: 'ef' x 1

Input:  $s = 'zyxwvutsrqponmlkjihgfedcba'
Output: none

Input:  $s = 'ghghijijij'
Output: 'ij' x 3

 

Any content of this website which has been created by Peter Campbell Smith is in the public domain