The Weekly Challenge

Peter Campbell Smith

Deduplicate
and going up

Weekly challenge 340 — 22 September 2025

Week 340: 22 Sep 2025

Task 1

Task — Duplicate removals

You are given a string, $str, consisting of lowercase English letters. Write a script to return the final string after all duplicate removals have been made. Repeat duplicate removals on the given string until you no longer can. A duplicate removal consists of choosing two adjacent and identical letters and removing them.

Examples


Example 1
Input: $str = 'abbaca'
Output: 'ca'
Step 1: Remove 'bb' => 'aaca'
Step 2: Remove 'aa' => 'ca'

Example 2
Input: $str = 'azxxzy'
Output: 'ay'
Step 1: Remove 'xx' => 'azzy'
Step 2: Remove 'zz' => 'ay'

Example 3
Input: $str = 'aaaaaaaa'
Output: ''
Step 1: Remove 'aa' => 'aaaaaa'
Step 2: Remove 'aa' => 'aaaa'
Step 3: Remove 'aa' => 'aa'
Step 4: Remove 'aa' => ''

Example 4
Input: $str = 'aabccba'
Output: 'a'
Step 1: Remove 'aa' => 'bccba'
Step 2: Remove 'cc' => 'bba'
Step 3: Remove 'bb' => 'a'

Example 5
Input: $str = 'abcddcba'
Output: ''
Step 1: Remove 'dd' => 'abccba'
Step 2: Remove 'cc' => 'abba'
Step 3: Remove 'bb' => 'aa'
Step 4: Remove 'aa' => ''

Analysis

I have provided two solutions for this challenge. The first is concise, though perhaps a little obscure and the second is longer but much faster.

Method 1 hinges on this statement:

$string =~ s|$1$1|| while $string =~ m|([a-z])\1|g;

In words, for as long as there is a doubled letter, delete that pair from the string. So what's wrong with that?

Firstly, it may not be obvious to the unwary that \1 and $1 refer to the same value, but that's Perl's convention. Secondly, I was a bit surprised to find that m|([a-z])\1|g backtracks. Given abba it will first find bb, and in a more conventional context it won't backtrack and match the two occurrences of a. But in this context, it has clearly 'noticed' that $string has changed between each execution of the m||g.

Interestingly, it works just as well without the g modifier and both take about the same time, so I deduce that it ignores the g in this context.

Method 1 starts each pass of the s||| and the m|| at the start of the string, and if the string is long, that's going to waste time looking at the stretch that has already been de-duplicated.

So method 2 makes a single pass along the input string. If the next character in the input string matches the last character in the output string, it discards both, or if they don't match it appends the input character to the output string. That's a lot more lines of code, but for a string of 10^5 random characters in the range [a-e] method 1 takes 87 seconds and method 2 takes 0.03 seconds on my (slowish) machine.

Try it

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2025-09-22
use utf8;     # Week 340 - task 1 - Duplicate removals
use warnings; # Peter Campbell Smith
binmode STDOUT, ':utf8';
use Encode;

duplicate_removals('abbaca');
duplicate_removals('aaaaaaaa');
duplicate_removals('abcdabcd');
duplicate_removals('abcdeedcbaf');
duplicate_removals('oodrrunggnpeellyyligoogcccazztfghhgfe');
duplicate_removals('xabcdeedcbay');

sub duplicate_removals {
    
    my ($string);
    say qq[\nInput:   '$_[0]'];
    
    # imethod 1 - concise
    $string = $_[0];
    $string =~ s|$1$1|| while $string =~ m|([a-z])\1|;
    say qq[Output1: '$string'];
    
    # method 2 - faster?
    my ($last, $output, $j, $char);
    
    $string = $_[0];
    $last = '#';
    $output = '';
    for $j (0 .. length($string) - 1) {
        $char = substr($string, $j, 1);
        
        # this char same as last - discard both
        if ($char eq $last) {
            $output = substr($output, 0, -1);
            $last = substr($output, -1);
        
        # this char differs from last - append it
        } else {
            $output .= $char;
            $last = $char;
        }
    }
    say qq[Output2: '$output'];
}

Output


Input:   'abbaca'
Output1: 'ca'
Output2: 'ca'

Input:   'aaaaaaaa'
Output1: ''
Output2: ''

Input:   'abcdabcd'
Output1: 'abcdabcd'
Output2: 'abcdabcd'

Input:   'abcdeedcbaf'
Output1: 'f'
Output2: 'f'

Input:   'oodrrunggnpeellyyligoogcccazztfghhgfe'
Output1: 'duplicate'
Output2: 'duplicate'

Any content of this website which has been created by Peter Campbell Smith is in the public domain