The Weekly Challenge

Peter’s blog ✴ Week 253 ✴ 22 January 2024

THE WEEKLY CHALLENGE
Split the weakest

Task 1

Split strings

You are given an array of strings and a character separator. Write a script to return all words separated by the given character, excluding empty ones.

Examples


Example 1
Input: @words = ("one.two.three","four.five","six")
       $separator = "."
Output: "one","two","three","four","five","six"

Example 2
Input: @words = ("$perl$$", "$$raku$")
       $separator = "$"
Output: "perl","raku"

Analysis

So this looks easy:

$text = join($separator, @words);
$text =~ s|$separator+|$separator|g;
@output = split(/$separator/, $text);

looks as if it will do the job. But there are a couple of complications! Firstly, @output may contain an empty word at the beginning or the end if - as in Example 2 - there are one or more separators at the beginning or end of $text. We need a shift or a pop on @output to remove those.

Secondly, Mr Anwar has cunningly given us examples where $separator has a special meaning in Perl code ($) or in a regular expression (.), so we need to be a bit careful with the split and use split(/\Q$separator\E+/, $text) because the usual interpretation of (most) special characters is suppressed between \Q and \E. Note that the + has to come after the \E because we do want it to mean one or more separators, not just a literal '+'.

I then wondered what happens if $separator is a backslash. As you'll see from the examples in my code, that works provided you enter it as a double backslash in the code ('\\'). That's only the case if, as I have done, you are supplying the data within your Perl code: if you were reading it in, a single backslash should work ... and yes, I just tried it and it does work.

And it works too if any of the characters are utf8 multibyte ones - see my last example.

Try it

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2024-01-22
use utf8;     # Week 253 task 1 - Split strings
use strict;   # Peter Campbell Smith
use warnings; 
binmode STDOUT, ':utf8';

split_strings(['one.two.three','four.five','six'], '.');
split_strings(['$perl$$', '$$raku$'], '$');
split_strings(['xonex', 'xtwox'], 'x');

# some edge cases
split_strings([',,,,,'], ',');
split_strings(['\\three\\blind\\', '\\mice\\'], '\\');
split_strings(['ŐőŕŒœŔŕŖ', 'ŗŘřŚŕ'], 'ŕ');

sub split_strings {
    
    my (@words, $separator, $text, @output);
    
    # initialise
    @words = @{$_[0]};
    $separator = substr($_[1] . ' ', 0, 1); # default is blank
    
    # join the input strings together with single separators
    $text = join($separator, @words);
    $text =~ s|\Q$separator\E+|$separator|g;
    
    # split that into individual words
    @output = split(/\Q$separator\E+/, $text);
    
    # remove an empty first or last word
    shift @output if (@output > 0 and $output[0] eq '');
    pop @output if (@output > 0 and $output[-1] eq '');

    # publish results
    say qq[\nInput:  \@words = ('] . join(qq[', '], @words) . qq[')];
    say qq[        \$separator = '$separator'];
    say qq[Output: ('] . join(q[', '], @output) . qq[')];   
}

12 lines of code

Output from script


Input:  @words = ('one.two.three', 'four.five', 'six')
        $separator = '.'
Output: ('one', 'two', 'three', 'four', 'five', 'six')

Input:  @words = ('$perl$$', '$$raku$')
        $separator = '$'
Output: ('perl', 'raku')

Input:  @words = ('xonex', 'xtwox')
        $separator = 'x'
Output: ('one', 'two')

Input:  @words = (',,,,,')
        $separator = ','
Output: ('')

Input:  @words = ('\three\blind\', '\mice\')
        $separator = '\'
Output: ('three', 'blind', 'mice')

Input:  @words = ('ŐőŕŒœŔŕŖ', 'ŗŘřŚŕ')
        $separator = 'ŕ'
Output: ('Őő', 'ŒœŔ', 'Ŗ', 'ŗŘřŚ')

Any content of this website which has been created by Peter Campbell Smith is in the public domain