Peter
Peter Campbell Smith

Matching strings and missing numbers

Weekly challenge 208 — 13 March 2023

Week 208: 13 Mar 2023

Task 1

Task — Minimum index sum

You are given two arrays of strings. Write a script to find out all common strings in the given two arrays with minimum index sum. If no common strings are found, return an empty list.

Analysis

On the face of it, this could be solved by using two nested loops, the outer loop over @list1 and the inner over @list2. If the entries are identical we compare their sum-of-indices to the best sum-of-indices found so far, and if so mark it as the new winner.

That will work, but for a long list - say 100 words - we will execute the inside of the inner loop 100^2 = 10 000 times, which could take a while.

But there's a better way. Lets start by going through @list1 and creating a hash, %index, where the hash index is the word and the hash value is the word's array index in @list1. So if @list1 is ('Tom', 'Dick, 'Harry') we end up with $index{Tom} == 0, $index{Dick} == 1 and $index{Harry} == 2. Note that if any word occurs more than once in @list1 we only want to record its first index.

Now let's loop over @list2. Before we do that we'll set $least - which is going be the least sum-of-indices we find - to a big number, and $rubric - where we'll put the answer - to an empty string.

As we loop over @list2, we can immediately stop if the index we've got to is more than $least, because this this one or any beyond it can't sum with a @list1 index to less than $least.

But if we haven't stopped, let's take the $word we've got to in @list2 and look at $index{$word}, which is the index of the word in @list1. So if $word is 'Harry', we immediately know that its index in @list1 is 2. We add that to the index in @list2, and see how it compares to $least. If it's greater, then we move to the next item in @list2, if it's less, then we have a new winner and can update $least and $rubric. If they are equal then we have a tie, so we can add this new answer to $rubric and leave $least unchanged.

And there we have it. All we need to do is to output the result. Note that if none of the words in @list2 is also in @list1, then $rubric will be an empty string, just as Mohammad requires.

You'll see that in my script I've generated a case where each of @list1 and @list2 contains 100 random lower-case letters, and that runs in negligible time.

Try it 

@list1:

@list2:

Example: Tom Dick Harry

Script


#!/usr/bin/perl

# Peter Campbell Smith - 2023-03-13

use v5.28;
use utf8;
use warnings;

# You are given two arrays of strings. Write a script to find out all 
# common strings in the given two arrays with minimum index sum. If no 
# common strings are found, return an empty list.

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge/208/1

my ($j, @list1, @list2);

min_index_sum(["Perl", "Raku", "Love"], ["Raku", "Perl", "Hate"]);
min_index_sum(['A', 'B', 'C'], ['D', 'E', 'F']);
min_index_sum(['A', 'B', 'C'], ['C', 'A', 'B']);

# make a longer example - 100 random letters in each list
for $j (0 .. 99) {
    $list1[$j] = chr(ord('a') + int(rand(26)));
    $list2[$j] = chr(ord('a') + int(rand(26)));
}   
min_index_sum(\@list1, \@list2);
    
sub min_index_sum {

    my (@list1, @list2, $j, %index, $least, $rubric1, $sum, $index1, $index2);
    
    @list1 = @{$_[0]};
    @list2 = @{$_[1]};
    
    # make index of first occurrence of word in @list1 - eg $index{"Raku"} == 1
    for $j (0 .. scalar @list1 - 1) {
        $index{$list1[$j]} = $j unless $index{$list1[$j]};
    }
    
    # scan through @list2
    $least = 10^6;
    $rubric1 = '';
    for $index2 (0 .. scalar @list2 - 1) {
        last if $index2 > $least;
        $index1 = $index{$list2[$index2]};
        
        # is this $list2 word in @list1?
        if (defined $index1) {
            $sum = $index1 + $index2;
            next if $sum > $least;
            
            # found a lesser sum of indices
            $rubric1 = '' if $sum < $least;
            $least = $sum;
            $rubric1 .= qq["$list2[$index2]", ];
        }
    }
    
    # show the answers
    say qq[\nInput:  \@list1 = ("] . join('", "', @list1) . q[")];
    say qq[        \@list2 = ("] . join('", "', @list2) . q[")];
    say qq[Output: (] . substr($rubric1, 0, -2) . q[)]; 
}   


Output


Input:  @list1 = ("Perl", "Raku", "Love")
        @list2 = ("Raku", "Perl", "Hate")
Output: ("Raku", "Perl")

Input:  @list1 = ("A", "B", "C")
        @list2 = ("D", "E", "F")
Output: ()

Input:  @list1 = ("A", "B", "C")
        @list2 = ("C", "A", "B")
Output: ("A")

Input:  @list1 = ("s", "w", "x", "q", "z", "h", "e", "j", "f", "b", "z", "u", "z", "l", "p", "n", "t", "l", "u", "e", "x", "b", "z", "s", "q", "w", "p", "u", "n", "j", "v", "c", "x", "y", "c", "a", "j", "a", "q", "a", "j", "v", "i", "l", "p", "t", "y", "z", "m", "c", "l", "c", "x", "o", "a", "w", "i", "y", "o", "z", "e", "v", "d", "g", "i", "l", "a", "s", "p", "m", "o", "r", "a", "x", "t", "x", "g", "x", "x", "c", "z", "f", "q", "e", "t", "j", "f", "y", "f", "z", "a", "l", "u", "b", "a", "w", "d", "j", "u", "r")
        @list2 = ("u", "d", "y", "x", "k", "f", "z", "l", "x", "d", "n", "r", "g", "v", "f", "k", "j", "t", "a", "x", "o", "b", "n", "a", "w", "l", "r", "q", "r", "t", "f", "d", "c", "l", "m", "n", "m", "t", "y", "v", "a", "h", "y", "y", "z", "p", "i", "y", "u", "h", "n", "g", "b", "k", "k", "k", "w", "t", "k", "u", "c", "v", "x", "w", "h", "r", "p", "q", "u", "f", "p", "j", "o", "z", "r", "t", "b", "v", "s", "a", "e", "a", "o", "a", "x", "g", "n", "z", "m", "u", "c", "x", "b", "a", "i", "j", "d", "l", "d", "g")
Output: ("x")

 

Any content of this website which has been created by Peter Campbell Smith is in the public domain