Split the weakest
Weekly challenge 253 — 22 January 2024
Week 253: 22 Jan 2024
You are given an array of strings and a character separator. Write a script to return all words separated by the given character, excluding empty ones.
Example 1 Input: @words = ("one.two.three","four.five","six") $separator = "." Output: "one","two","three","four","five","six" Example 2 Input: @words = ("$perl$$", "$$raku$") $separator = "$" Output: "perl","raku"
So this looks easy:
$text = join($separator, @words); $text =~ s|$separator+|$separator|g; @output = split(/$separator/, $text);
looks as if it will do the job. But there are a couple of complications! Firstly, @output
may contain an empty word at the beginning
or the end if - as in Example 2 - there are one or more separators at the beginning or end of $text
. We need
a shift or a pop on @output
to remove those.
Secondly, Mr Anwar has cunningly given us examples where $separator
has a special meaning in Perl code ($)
or in a regular expression (.), so we need to be a bit careful with the split and use
split(/\Q$separator\E+/, $text
) because the usual interpretation of (most)
special characters is suppressed between \Q and \E. Note that the + has to come after the \E because we do want it
to mean one or more separators, not just a literal '+'.
I then wondered what happens if $separator
is a backslash. As you'll see from the examples in my code,
that works provided you enter it as a double backslash in the code ('\\'). That's only the case if, as I have
done, you are supplying the data within your Perl code: if you were reading it in, a single backslash
should work ... and yes, I just tried it and it does work.
And it works too if any of the characters are utf8 multibyte ones - see my last example.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2024-01-22 use utf8; # Week 253 task 1 - Split strings use strict; # Peter Campbell Smith use warnings; binmode STDOUT, ':utf8'; split_strings(['one.two.three','four.five','six'], '.'); split_strings(['$perl$$', '$$raku$'], '$'); split_strings(['xonex', 'xtwox'], 'x'); # some edge cases split_strings([',,,,,'], ','); split_strings(['\\three\\blind\\', '\\mice\\'], '\\'); split_strings(['ŐőŕŒœŔŕŖ', 'ŗŘřŚŕ'], 'ŕ'); sub split_strings { my (@words, $separator, $text, @output); # initialise @words = @{$_[0]}; $separator = substr($_[1] . ' ', 0, 1); # default is blank # join the input strings together with single separators $text = join($separator, @words); $text =~ s|\Q$separator\E+|$separator|g; # split that into individual words @output = split(/\Q$separator\E+/, $text); # remove an empty first or last word shift @output if (@output > 0 and $output[0] eq ''); pop @output if (@output > 0 and $output[-1] eq ''); # publish results say qq[\nInput: \@words = ('] . join(qq[', '], @words) . qq[')]; say qq[ \$separator = '$separator']; say qq[Output: ('] . join(q[', '], @output) . qq[')]; }
Input: @words = ('one.two.three', 'four.five', 'six') $separator = '.' Output: ('one', 'two', 'three', 'four', 'five', 'six') Input: @words = ('$perl$$', '$$raku$') $separator = '$' Output: ('perl', 'raku') Input: @words = ('xonex', 'xtwox') $separator = 'x' Output: ('one', 'two') Input: @words = (',,,,,') $separator = ',' Output: ('') Input: @words = ('\three\blind\', '\mice\') $separator = '\' Output: ('three', 'blind', 'mice') Input: @words = ('ŐőŕŒœŔŕŖ', 'ŗŘřŚŕ') $separator = 'ŕ' Output: ('Őő', 'ŒœŔ', 'Ŗ', 'ŗŘřŚ')
Any content of this website which has been created by Peter Campbell Smith is in the public domain