Merge like a zip and Unidecode
Weekly challenge 186 — 10 October 2022
Week 186: 10 Oct 2022
You are given a string with possible Unicode characters.
Create a subroutine sub makeover($str)
that replaces the Unicode characters with the ASCII equivalent. For this task, let us assume it only contains alphabets.
Example 1 Input: $str = 'ÃÊÍÒÙ'; Output: 'AEIOU' Example 2 Input: $str = 'âÊíÒÙ'; Output: 'aEiOU'
This is an interesting challenge in that there is no way (that I know) of investigating the shape of a character and identifying that à is actually represented in print as A with a tilde above it.
One possibility would be to go through the Unicode code pages and manually create a translation,
eg $plain{'Ã'} = 'A'
. But that would be painful, because aside from the ones we probably know about
from French and German, there are dozens more that exist.
Fortunately:
#!/usr/bin/perl # Peter Campbell Smith - 2022-10-10 # PWC 186 task 2 use v5.28; use utf8; use warnings; use charnames ':full'; binmode(STDOUT, ':utf8'); my (@tests, $test); @tests = ('ÃÊÍÒÙ', 'âÊíÒÙ', 'ĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİŁłŃńŐőŔŕŖŗŘřŚśŜŝŞş'); # loop over tests for $test (@tests) { say qq[\nInput: $test\nOutput: ] . makeover($test); } sub makeover { my ($result, $char, $name); # loop over characters within test while ($_[0] =~ m|(.)|g) { $char = $1; # get Unicode name for character $name = charnames::viacode(ord($char)); # check if it is a modified latin letter and if so substitute unmodified letter if ($name =~ m|^LATIN CAPITAL LETTER (.)|) { $result .= $1; } elsif ($name =~ m|^LATIN SMALL LETTER (.)|) { $result .= lc($1); # or if not just copy input to output } else { $result .= $char; } } return $result; }
Input: ÃÊÍÒÙ Output: AEIOU Input: âÊíÒÙ Output: aEiOU Input: ĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİŁłŃńŐőŔŕŖŗŘřŚśŜŝŞş Output: DdEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiILlNnOoRrRrRrSsSsSs
Any content of this website which has been created by Peter Campbell Smith is in the public domain