Camel
Peter
Peter Campbell Smith

Words and more words

Weekly challenge 370 — 20 April 2026

Week 370: 20 Apr 2026

Task 1

Task — Popular word

You are given a string containing a paragraph and an array of banned words.

Write a script to return the most popular word that is not banned. It is guaranteed that there is at least one word that is not banned and that the answer is unique. The words in the paragraph may be in mixed case and the answer should be in lower case. The words do not contain punctuation symbols.

Examples


Example 1
Input: $paragraph = 'Bob hit a ball,
   the hit BALL flew far after it was hit.'
       @banned = ('hit')
Output: 'ball'
After removing punctuation and converting to lowercase,
   the word 'hit' appears 3 times,
   and 'ball' appears 2 times.
Since 'hit' is on the banned list, we ignore it.

Example 2
Input: $paragraph = 'Apple? apple! Apple, pear, orange,
   pear, apple, orange.'
       @banned = ('apple', 'pear')
Output: 'orange'
'apple'  appears 4 times.
'pear'   appears 2 times.
'orange' appears 2 times.
'apple' and 'pear' are both banned.
Even though 'orange' has the same frequency as 'pear',
   it is the only non-banned word with the highest 
   frequency.

Example 3
Input: $paragraph = 'A. a, a! A. B. b. b.'
       @banned = ('b')
Output: 'a'
'a' appears 4 times.
'b' appears 3 times.
The input has mixed casing and heavy punctuation.
The normalised, 'a' is the clear winner,
   since 'b' is banned, 'a' is the only choice.

Example 4
Input: $paragraph = 'Ball.ball,ball:apple!apple.banana'
       @banned = ('ball')
Output: 'apple'
Here the punctuation acts as a delimiter.
'ball'   appears 3 times.
'apple'  appears 2 times.
'banana' appears 1 time.

Example 5
Input: $paragraph = 'The dog chased the cat,
   but the dog was faster than the cat.'
       @banned = ('the', 'dog')
Output: 'cat'
'the' appears 4 times.
'dog' appears 2 times.
'cat' appears 2 times.
'chased', 'but', 'was', 'faster',
   'than' appear 1 time each.
'the' is the most frequent but is banned.
'dog' is the next most frequent but is also banned.
The next most frequent non-banned word is 'cat'.

Analysis

There is a slight dilemma in challenges like this as to whether to consider them as an intellectual puzzle to be solved concisely as possible, or as if they were a function in some larger production system where accuracy and speed may be requirements.

I tend to the former interpretation, and have done so here.

Perhaps the best example of the concise approach is that I first count the frequencies of all the words, storing those in a hash: $count{$word} = $frequency.

I can then easily handle the banned words by deleting the corresponding hash elements:

delete $count{$_} for @$banned;

And then it's just a case of finding the largest remaining count.

This solution avoids sorting anything, which could be an advantage if the volume of words was large - say in the millions.

Try it 

Try running the script with any input:



example: Egg, chips and beans; egg and beans; or beans and chips



example: chips, egg, beans

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2026-04-20
use utf8;     # Week 370 - task 1 - Popular word
use warnings; # Peter Campbell Smith
binmode STDOUT, ':utf8';
use Encode;

popular_word('Bob hit a ball, the hit BALL flew far after it was hit.', ['hit']);
popular_word('Egg, chips and beans; egg and beans; or beans and chips',
    ['chips', 'egg', 'beans']);
popular_word('Write a script to return the most popular 
    word that is not banned. It is guaranteed there is at
    least one word that is not banned and the answer is 
    unique. The words in paragraph are case-insensitive 
    and the answer should be in lower case. The words 
    cannot contain punctuation symbols.', ['the', 'is']);


sub popular_word {
    
    my ($para, $banned, %count, $best, $best_word, $word);
    
    # initialise
    ($para, $banned) = @_;
    
    # clean up $para
    $para = lc($para);
    $para =~ s|[^a-z ]*||g;
    
    # count frequency of words
    $count{$1} ++ while $para =~ m|([a-z]+)|g;
    
    # eliminate banned words
    delete $count{$_} for @$banned;
    
    # find most frequent
    $best = -1;
    for $word (keys %count) {
        next unless $count{$word} > $best;
        $best = $count{$word};
        $best_word = $word;
    }
    
    say qq[\nInput:  \$para = '$_[0]'];
    say qq[        \$banned = '] . join(q[', '], @$banned) . q['];
    say qq[Output: '$best_word' occurs $best times];
}

15 lines of code
Completed after the closing date and not submitted to GitHub

Output


Input:  $para = 'Bob hit a ball,
   the hit BALL flew far after it was hit.'
        $banned = 'hit'
Output: 'ball' occurs 2 times

Input:  $para = 'Egg,
   chips and beans; egg and beans; or beans and chips'
        $banned = 'chips', 'egg', 'beans'
Output: 'and' occurs 3 times

Input:  $para = 'Write a script to return the most 
	popular word that is not banned. It is guaranteed 
	there is at least one word that is not banned and 
	the answer is unique. The words in paragraph are 
	case-insensitive and the answer should be in lower
	case. The words cannot contain punctuation 
	symbols.'
        $banned = 'the', 'is'
Output: 'words' occurs 2 times

 

Any content of this website which has been created by Peter Campbell Smith is in the public domain