Deduplicate
and going up
Weekly challenge 340 — 22 September 2025
Week 340: 22 Sep 2025
You are given a string, $str
, consisting of lowercase English letters.
Write a script to return the final string after all duplicate removals have been made. Repeat duplicate removals on the given string until you no longer can.
A duplicate removal consists of choosing two adjacent and identical letters and removing them.
Example 1 Input: $str = 'abbaca' Output: 'ca' Step 1: Remove 'bb' => 'aaca' Step 2: Remove 'aa' => 'ca' Example 2 Input: $str = 'azxxzy' Output: 'ay' Step 1: Remove 'xx' => 'azzy' Step 2: Remove 'zz' => 'ay' Example 3 Input: $str = 'aaaaaaaa' Output: '' Step 1: Remove 'aa' => 'aaaaaa' Step 2: Remove 'aa' => 'aaaa' Step 3: Remove 'aa' => 'aa' Step 4: Remove 'aa' => '' Example 4 Input: $str = 'aabccba' Output: 'a' Step 1: Remove 'aa' => 'bccba' Step 2: Remove 'cc' => 'bba' Step 3: Remove 'bb' => 'a' Example 5 Input: $str = 'abcddcba' Output: '' Step 1: Remove 'dd' => 'abccba' Step 2: Remove 'cc' => 'abba' Step 3: Remove 'bb' => 'aa' Step 4: Remove 'aa' => ''
I have provided two solutions for this challenge. The first is concise, though perhaps a little obscure and the second is longer but much faster.
Method 1 hinges on this statement:
$string =~ s|$1$1|| while $string =~ m|([a-z])\1|g;
In words, for as long as there is a doubled letter, delete that pair from the string. So what's wrong with that?
Firstly, it may not be obvious to the unwary that
\1
and $1
refer to the same value, but that's Perl's convention.
Secondly, I was a bit surprised to find that m|([a-z])\1|g
backtracks. Given abba
it will first find bb
,
and in a more conventional context it won't backtrack
and match the two occurrences of a
. But in this
context, it has clearly 'noticed' that $string
has
changed between each execution of the m||g
.
Interestingly, it works just as well without the g
modifier and both take about the same time, so I deduce
that it ignores the g
in this context.
Method 1 starts each pass of the
s|||
and the m||
at the start of the string,
and if the string is long, that's going to waste time
looking at the stretch that has already been de-duplicated.
So method 2 makes a single pass along the input string.
If the next character in the input string matches the
last character in the output string, it discards both, or
if they don't match it appends the input character to the
output string. That's a lot more lines of code, but for
a string of 10^5 random characters in the range [a-e]
method 1 takes 87 seconds and method 2 takes 0.03 seconds
on my (slowish) machine.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2025-09-22 use utf8; # Week 340 - task 1 - Duplicate removals use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Encode; duplicate_removals('abbaca'); duplicate_removals('aaaaaaaa'); duplicate_removals('abcdabcd'); duplicate_removals('abcdeedcbaf'); duplicate_removals('oodrrunggnpeellyyligoogcccazztfghhgfe'); duplicate_removals('xabcdeedcbay'); sub duplicate_removals { my ($string); say qq[\nInput: '$_[0]']; # imethod 1 - concise $string = $_[0]; $string =~ s|$1$1|| while $string =~ m|([a-z])\1|; say qq[Output1: '$string']; # method 2 - faster? my ($last, $output, $j, $char); $string = $_[0]; $last = '#'; $output = ''; for $j (0 .. length($string) - 1) { $char = substr($string, $j, 1); # this char same as last - discard both if ($char eq $last) { $output = substr($output, 0, -1); $last = substr($output, -1); # this char differs from last - append it } else { $output .= $char; $last = $char; } } say qq[Output2: '$output']; }
Input: 'abbaca' Output1: 'ca' Output2: 'ca' Input: 'aaaaaaaa' Output1: '' Output2: '' Input: 'abcdabcd' Output1: 'abcdabcd' Output2: 'abcdabcd' Input: 'abcdeedcbaf' Output1: 'f' Output2: 'f' Input: 'oodrrunggnpeellyyligoogcccazztfghhgfe' Output1: 'duplicate' Output2: 'duplicate'
Any content of this website which has been created by Peter Campbell Smith is in the public domain