Words and more words
Weekly challenge 370 — 20 April 2026
Week 370: 20 Apr 2026
You are given a string containing a paragraph and an array of banned words.
Write a script to return the most popular word that is not banned. It is guaranteed that there is at least one word that is not banned and that the answer is unique. The words in the paragraph may be in mixed case and the answer should be in lower case. The words do not contain punctuation symbols.
Example 1 Input: $paragraph = 'Bob hit a ball, the hit BALL flew far after it was hit.' @banned = ('hit') Output: 'ball' After removing punctuation and converting to lowercase, the word 'hit' appears 3 times, and 'ball' appears 2 times. Since 'hit' is on the banned list, we ignore it. Example 2 Input: $paragraph = 'Apple? apple! Apple, pear, orange, pear, apple, orange.' @banned = ('apple', 'pear') Output: 'orange' 'apple' appears 4 times. 'pear' appears 2 times. 'orange' appears 2 times. 'apple' and 'pear' are both banned. Even though 'orange' has the same frequency as 'pear', it is the only non-banned word with the highest frequency. Example 3 Input: $paragraph = 'A. a, a! A. B. b. b.' @banned = ('b') Output: 'a' 'a' appears 4 times. 'b' appears 3 times. The input has mixed casing and heavy punctuation. The normalised, 'a' is the clear winner, since 'b' is banned, 'a' is the only choice. Example 4 Input: $paragraph = 'Ball.ball,ball:apple!apple.banana' @banned = ('ball') Output: 'apple' Here the punctuation acts as a delimiter. 'ball' appears 3 times. 'apple' appears 2 times. 'banana' appears 1 time. Example 5 Input: $paragraph = 'The dog chased the cat, but the dog was faster than the cat.' @banned = ('the', 'dog') Output: 'cat' 'the' appears 4 times. 'dog' appears 2 times. 'cat' appears 2 times. 'chased', 'but', 'was', 'faster', 'than' appear 1 time each. 'the' is the most frequent but is banned. 'dog' is the next most frequent but is also banned. The next most frequent non-banned word is 'cat'.
There is a slight dilemma in challenges like this as to whether to consider them as an intellectual puzzle to be solved concisely as possible, or as if they were a function in some larger production system where accuracy and speed may be requirements.
I tend to the former interpretation, and have done so here.
Perhaps the best example of the concise approach is
that I first count the frequencies of all the words,
storing those in a hash: $count{$word} = $frequency.
I can then easily handle the banned words by deleting the corresponding hash elements:
delete $count{$_} for @$banned;
And then it's just a case of finding the largest remaining count.
This solution avoids sorting anything, which could be an advantage if the volume of words was large - say in the millions.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2026-04-20 use utf8; # Week 370 - task 1 - Popular word use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Encode; popular_word('Bob hit a ball, the hit BALL flew far after it was hit.', ['hit']); popular_word('Egg, chips and beans; egg and beans; or beans and chips', ['chips', 'egg', 'beans']); popular_word('Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lower case. The words cannot contain punctuation symbols.', ['the', 'is']); sub popular_word { my ($para, $banned, %count, $best, $best_word, $word); # initialise ($para, $banned) = @_; # clean up $para $para = lc($para); $para =~ s|[^a-z ]*||g; # count frequency of words $count{$1} ++ while $para =~ m|([a-z]+)|g; # eliminate banned words delete $count{$_} for @$banned; # find most frequent $best = -1; for $word (keys %count) { next unless $count{$word} > $best; $best = $count{$word}; $best_word = $word; } say qq[\nInput: \$para = '$_[0]']; say qq[ \$banned = '] . join(q[', '], @$banned) . q[']; say qq[Output: '$best_word' occurs $best times]; }
15 lines of code
Completed after the closing date and not submitted to GitHub
Input: $para = 'Bob hit a ball, the hit BALL flew far after it was hit.' $banned = 'hit' Output: 'ball' occurs 2 times Input: $para = 'Egg, chips and beans; egg and beans; or beans and chips' $banned = 'chips', 'egg', 'beans' Output: 'and' occurs 3 times Input: $para = 'Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lower case. The words cannot contain punctuation symbols.' $banned = 'the', 'is' Output: 'words' occurs 2 times
Any content of this website which has been created by Peter Campbell Smith is in the public domain