Squeezing and Squaring
Weekly challenge 296 — 18 November 2024
Week 296: 18 Nov 2024
You are given a string of alphabetic characters, $chars
.
Write a script to compress the string with run-length encoding, as shown in the examples.
A compressed unit can be either a single character or a count followed by a character.
BONUS: Write a decompression function.
Example 1 Input: $chars = "abbc" Output: "a2bc" Example 2 Input: $chars = "aaabccc" Output: "3ab3c" Example 3 Input: $chars = "abcc" Output: "ab2c"
This could be done in a 2-liner:
for ($j = length($chars) - 1; $j >= 1; $j --) { $chars =~ s|(.)\1{$j}|($j + 1) . $1|ge; }
and you could probably squeeze that into a single line.
However, it isn't very efficient. It works well for up
to around a few hundred characters in $chars
, but
with 5000 characters it takes over 30 seconds, because
the regex is executed 5000 times, and worse, because it
contains variables I think the regex gets compiled 5000
times.
So I have submitted an alternative, longer solution that works in negligible time even with 10000 characters.
And in order to show that it works I have added a
decompression function, and run it for every example,
showing that the decompressed string matches
$chars
and calculating the %age compression.
That %age shows that the compression algorithm is hardly worth doing! Even a string comprised of random a and b only achieves around 20% compression, and strings with more different letters hover around 5-10%.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2024-11-18 use utf8; # Week 296 - task 1 - String compression use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; string_compression('aaaaabbcdeffffghiiikklmnopp'); string_compression('a committee of bookkeepers'); string_compression('abbbbbbbbbbbbbbbbbbbbc'); # try a longer string of random a and b my $chars; $chars .= chr(int(rand(2) + ord('a'))) for 0 .. 199; string_compression($chars); sub string_compression { my ($chars, $chars2, $char, $count, $c, $de); # initialise $chars = shift; say qq[\nInput: \$chars = '$chars']; $chars .= chr(0); $chars2 = ''; $char = ''; $count = 0; # loop over characters in $chars for $c (split('', $chars)) { # same as previous if ($c eq $char) { $count ++; # different from previous } else { $chars2 .= ($count > 0 ? $count + 1 : '') . $char; $count = 0; $char = $c; } } $chars = substr($chars, 0, -1); # report $de = string_decompression($chars2); say qq[Compressed: \$chars = '$chars2']; say qq[Decompressed: \$chars = '$de']; say qq[Result: ] . ($de eq $chars ? 'good - ' : 'bad - ') . 'compressed to ' . (int(length($chars2) / length($chars) * 100)) . '%'; } sub string_decompression { my ($chars, $chars2, $count, $c); # initialise $chars = shift; $chars2 = ''; $count = 0; $chars .= chr(0); # loop over chars in $chars for $c (split('', $chars)) { # a number (might be > 9) if ($c =~ m|\d|) { $count = $count * 10 + $c; # a letter } else { $count = 1 unless $count; $chars2 .= $c for 1 .. $count; $count = 0; } } return substr($chars2, 0, -1); }
Input: $chars = 'aaaaabbcdeffffghiiikklmnopp' Compressed: $chars = '5a2bcde4fgh3i2klmno2p' Decompressed: $chars = 'aaaaabbcdeffffghiiikklmnopp' Result: good - compressed to 77% Input: $chars = 'a committee of bookkeepers' Compressed: $chars = 'a co2mi2t2e of b2o2k2epers' Decompressed: $chars = 'a committee of bookkeepers' Result: good - compressed to 100% Input: $chars = 'abbbbbbbbbbbbbbbbbbbbc' Compressed: $chars = 'a20bc' Decompressed: $chars = 'abbbbbbbbbbbbbbbbbbbbc' Result: good - compressed to 22% Input: $chars = 'baabbabbbaaabaabbababababbbbbbabb bbbbbbbbaaabbbbabbbbbaabbaabaaaaaaabbbbbabababaaabaababba babaaaababbbbabaaabbbabaaaaaabbabaabbbaabbababaaabbbaaabb baababaaaababbbababaababaabbaabababbaaaabaaaaabbaabaa' Compressed: $chars = 'b2a2ba3b3ab2a2babababa6ba10b3a4ba 5b2a2b2ab7a5bababab3ab2aba2babab4aba4bab3a3bab6a2bab2a3b2 a2babab3a3b3a3b2abab4aba3babab2abab2a2b2ababa2b4ab5a2b2ab 2a' Decompressed: $chars = 'baabbabbbaaabaabbababababbbbbbabb bbbbbbbbaaabbbbabbbbbaabbaabaaaaaaabbbbbabababaaabaababba babaaaababbbbabaaabbbabaaaaaabbabaabbbaabbababaaabbbaaabb baababaaaababbbababaababaabbaabababbaaaabaaaaabbaabaa' Result: good - compressed to 74%
Any content of this website which has been created by Peter Campbell Smith is in the public domain