Peter
Peter Campbell Smith

An odd character’s
nearly best word

Weekly challenge 255 — 5 February 2024

Week 255 - 5 Feb 2024

Task 2

Task — Most frequent word

You are given a paragraph $p and a banned word $w. Write a script to return the most frequent word that is not banned.

Examples


Input: $p = "Joe hit a ball, the hit ball flew far after 
             it was hit."
       $w = "hit"
Output: "ball"

The banned word "hit" occurs 3 times.
The other word "ball" occurs 2 times.

Example 2
Input: $p = "Perl and Raku belong to the same family. 
             Perl is the most popular language in the 
             weekly challenge."
       $w = "the"
Output: "Perl"

The banned word "the" occurs 3 times.
The other word "Perl" occurs 2 times.

Analysis

My solution is perhaps the most obvious one - and perhaps also not the briefest or most efficient.

I iterate over all the words in $p, counting their frequency. If the word isn't the banned word, I note it and its frequency if it is the most frequent unbanned word to date.

It is of course possible that there will be more than one such word, or even none, and the banned word may also not appear, so all those cases need to be covered.

Try it 

Try running the script with any input:



example: toot, toot, toot went the choo choo



example: toot

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2024-02-05
use utf8;     # Week 255 task 2 - Most frequent word
use strict;   # Peter Campbell Smith
use warnings; 
binmode STDOUT, ':utf8';

most_frequent_word('Joe hit a ball, the hit ball flew far 
    after it was hit by Joe.', 'hit');

most_frequent_word('Perl and Raku belong to the same 
    family. Perl is the most popular language in the 
    weekly challenge.', 'the');
    
most_frequent_word('banned banned allowed allowed allowed',
    'banned');

sub most_frequent_word {
    
    my ($p, $q, $w, $most, $word, %times);
    
    # initialise
    $p = shift;
    $w = lc(shift);
    say qq[\nInput:  \$p = '$p'\n        \$w = '$w'];
    $p = lc($p);
    
    # do it
    $most = 0;
    $times{$w} = 0;
    
    # split $p into words and discard any that eq $w
    while ($p =~ m|([a-z]+)|gs) {
        $q = $1;
        
        # record one use of this word
        $times{$q} ++;
        next if $q eq $w;
        
        # if more than the previous best, make it the new best
        if ($times{$q} > $most) {
            $most = $times{$q};
            $word = $q;
            
        # if the same as the previous best, record it as equal best 
        } elsif ($times{$q} == $most) {
            $word .= qq[, $q];
        }
    }
    
    # deliver the answer
    say qq[Output: '$word' - $most times];
    say qq[        '$w' (banned) - $times{$w} times];
}

Output


Input:  $p = 'Joe hit a ball, the hit ball flew far
        after it was hit by Joe.'
        $w = 'hit'
Output: 'ball, joe' - 2 times
        'hit' (banned) - 3 times

Input:  $p = 'Perl and Raku belong to the same
        family. Perl is the most popular language in the
        weekly challenge.'
        $w = 'the'
Output: 'perl' - 2 times
        'the' (banned) - 3 times

Input:  $p = 'banned banned allowed allowed allowed'
        $w = 'banned'
Output: 'allowed' - 3 times
        'banned' (banned) - 2 times