Bits of strings and
strings of bits
Weekly challenge 352 — 15 December 2025
Week 352: 15 Dec 2025
You are given an array of strings. Write a script to return all strings that are a substring of another word in the given array, in the order they occur.
Example 1 Input: @words = ('cat', 'cats', 'dog', 'dogcat', 'dogcat', 'rat', 'ratcatdogcat') Output: ('cat', 'dog', 'dogcat', 'rat') Example 2 Input: @words = ('hello', 'hell', 'world', 'wor', 'ellow', 'elloworld') Output: ('hell', 'world', 'wor', 'ellow') Example 3 Input: @words = ('a', 'aa', 'aaa', 'aaaa') Output: ('a', 'aa', 'aaa') Example 4 Input: @words = ('flower', 'flow', 'flight', 'fl', 'fli', 'ig', 'ght') Output: ('flow', 'fl', 'fli', 'ig', 'ght') Example 5 Input: @words = ('car', 'carpet', 'carpenter', 'pet', 'enter', 'pen', 'pent') Output: ('car', 'pet', 'enter', 'pen', 'pent')
The obvious solution to this is to have two nested loops over all the words and test for:
$words[$j] =~ m|$words[$i]|
That works, but a regular expression with an argument only known at run time is quite inefficient in Perl as it has to be recompiled every time it is executed. Of course that hardly matters for short lists of words, but we can at least add a few optimisations:
$words[$i].
$words[$j] is longer than
$words[$i], which conveniently also skips the case where
$i == $j.I tried my solution on the content of today's featured Wikipedia entry - Simon Cameron - which is several thousand words long, and it completed in under 10 seconds, finding about 400 valid subset words. So I think that's good enough.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2025-12-15 use utf8; # Week 352 - task 1 - Match string use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Encode; match_string('cat', 'cats', 'dog', 'dogcat', 'dogcat', 'rat', 'ratcatdogcat'); match_string('hello', 'hell', 'world', 'wor', 'ellow', 'elloworld'); match_string('cat', 'dog', 'mouse', 'elephant', 'giraffe', 'alligator'); match_string('cat', 'cat', 'cat', 'cat', 'cat', 'cat'); match_string(qw[Write a script to return all strings that are a substring of another word in the given array in the order they occur]); sub match_string { my (@words, $n, $output, %seen, @length, $i, $j); # initialise @words = @_; $n = @words - 1; $output = ''; # record lengths push(@length, length($_)) for @words; # loop over source words and discard any seen already WORD: for $i (0 .. $n) { next if $seen{$words[$i]}; $seen{$words[$i]} = 1; # loop over target words for $j (0 .. $n) { # discard unless source is subset of target next if ($length[$j] <= $length[$i] or $words[$j] !~ m|$words[$i]|); # record, and skip to next source word $output .= qq['$words[$i]', ]; next WORD; } } say qq[\nInput: '] . join(qq[', '], @words) . q[']; say qq[Output: ] . ($output ? substr($output, 0, -2) : 'none'); }
Input: 'cat', 'cats', 'dog', 'dogcat', 'dogcat', 'rat', 'ratcatdogcat' Output: 'cat', 'dog', 'dogcat', 'rat' Input: 'hello', 'hell', 'world', 'wor', 'ellow', 'elloworld' Output: 'hell', 'world', 'wor', 'ellow' Input: 'cat', 'dog', 'mouse', 'elephant', 'giraffe', 'alligator' Output: none Input: 'cat', 'cat', 'cat', 'cat', 'cat', 'cat' Output: none Input: 'Write', 'a', 'script', 'to', 'return', 'all', 'strings', 'that', 'are', 'a', 'substring', 'of', 'another', 'word', 'in', 'the', 'given', 'array', 'in', 'the', 'order', 'they', 'occur' Output: 'a', 'in', 'the'
Any content of this website which has been created by Peter Campbell Smith is in the public domain