Camel
Peter
Peter Campbell Smith

Bits of strings and
strings of bits

Weekly challenge 352 — 15 December 2025

Week 352: 15 Dec 2025

Task 1

Task — Match string

You are given an array of strings. Write a script to return all strings that are a substring of another word in the given array, in the order they occur.

Examples


Example 1
Input: @words = ('cat', 'cats', 'dog', 'dogcat', 'dogcat',
   'rat', 'ratcatdogcat')
Output: ('cat', 'dog', 'dogcat', 'rat')

Example 2
Input: @words = ('hello', 'hell', 'world', 'wor', 'ellow',
   'elloworld')
Output: ('hell', 'world', 'wor', 'ellow')

Example 3
Input: @words = ('a', 'aa', 'aaa', 'aaaa')
Output: ('a', 'aa', 'aaa')

Example 4
Input: @words = ('flower', 'flow', 'flight', 'fl', 'fli',
   'ig', 'ght')
Output: ('flow', 'fl', 'fli', 'ig', 'ght')

Example 5
Input: @words = ('car', 'carpet', 'carpenter', 'pet',
   'enter', 'pen', 'pent')
Output: ('car', 'pet', 'enter', 'pen', 'pent')

Analysis

The obvious solution to this is to have two nested loops over all the words and test for:

$words[$j] =~ m|$words[$i]|

That works, but a regular expression with an argument only known at run time is quite inefficient in Perl as it has to be recompiled every time it is executed. Of course that hardly matters for short lists of words, but we can at least add a few optimisations:

  • Calculate the lengths of all the words first, as we're going to need those repeatedly.
  • Skip the inner loop if we've already seen an instance of $words[$i].
  • Skip the regular expression test unless $words[$j] is longer than $words[$i], which conveniently also skips the case where $i == $j.

I tried my solution on the content of today's featured Wikipedia entry - Simon Cameron - which is several thousand words long, and it completed in under 10 seconds, finding about 400 valid subset words. So I think that's good enough.

Try it 

Try running the script with any input:



example: cathode, scatter, bobcat, cat

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2025-12-15
use utf8;     # Week 352 - task 1 - Match string
use warnings; # Peter Campbell Smith
binmode STDOUT, ':utf8';
use Encode;

match_string('cat', 'cats', 'dog', 'dogcat', 'dogcat', 'rat', 'ratcatdogcat');
match_string('hello', 'hell', 'world', 'wor', 'ellow', 'elloworld');
match_string('cat', 'dog', 'mouse', 'elephant', 'giraffe', 'alligator');
match_string('cat', 'cat', 'cat', 'cat', 'cat', 'cat');
match_string(qw[Write a script to return all strings that 
    are a substring of another word in the given array 
    in the order they occur]);

sub match_string {
    
    my (@words, $n, $output, %seen, @length, $i, $j);
    
    # initialise
    @words = @_;
    $n = @words - 1;
    $output = '';
    
    # record lengths
    push(@length, length($_)) for @words;
    
    # loop over source words and discard any seen already
    WORD: for $i (0 .. $n) {
        next if $seen{$words[$i]};
        $seen{$words[$i]} = 1;
        
        # loop over target words
        for $j (0 .. $n) {
            
            # discard unless source is subset of target 
            next if ($length[$j] <= $length[$i] or $words[$j] !~ m|$words[$i]|);
            
            # record, and skip to next source word
            $output .= qq['$words[$i]', ];
            next WORD;
        }
    }
    
    say qq[\nInput:  '] . join(qq[', '], @words) . q['];
    say qq[Output: ] . ($output ? substr($output, 0, -2) : 'none');
}

Output


Input:  'cat', 'cats', 'dog', 'dogcat', 'dogcat', 'rat',
   'ratcatdogcat'
Output: 'cat', 'dog', 'dogcat', 'rat'

Input:  'hello', 'hell', 'world', 'wor', 'ellow',
   'elloworld'
Output: 'hell', 'world', 'wor', 'ellow'

Input:  'cat', 'dog', 'mouse', 'elephant', 'giraffe',
   'alligator'
Output: none

Input:  'cat', 'cat', 'cat', 'cat', 'cat', 'cat'
Output: none

Input:  'Write', 'a', 'script', 'to', 'return', 'all',
   'strings', 'that', 'are', 'a', 'substring', 'of',
   'another', 'word', 'in', 'the', 'given', 'array', 'in',
   'the', 'order', 'they', 'occur'
Output: 'a', 'in', 'the'

 

Any content of this website which has been created by Peter Campbell Smith is in the public domain