Peter
Peter Campbell Smith

Big ones and jelmbud wrods

Weekly challenge 289 — 30 September 2024

Week 289: 30 Sep 2024

Task 2

Task — Jumbled letters

An Internet legend dating back to at least 2001 goes something like this:

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

This supposed Cambridge research is unfortunately an urban legend. However, the effect has been studied. For example—and with a title that probably made the journal’s editor a little nervous—Raeding wrods with jubmled lettres: there is a cost by Rayner, White, et al. looked at reading speed and comprehension of jumbled text.

Your task is to write a program that takes English text as its input and outputs a jumbled version as follows:

  • The first and last letter of every word must stay the same
  • The remaining letters in the word are scrambled in a random order (if that happens to be the original order, that is OK).
  • Whitespace, punctuation, and capitalization must stay the same
  • The order of words does not change, only the letters inside the word

I don’t know if this effect has been studied in other languages besides English, but please consider sharing your results if you try!

Examples


Example 1:
“Perl” could become “Prel”, or stay as “Perl”, but it 
could not become “Pelr” or “lreP”.

Analysis

The task description is silent as to words with embedded punctuation such as Don't or half-baked, so I have assumed that the punctuation stays where it is but the letters can be jumbled between before and after, so for example half-baked could become heka-flabd.

So my algorithm looks like this:

  • break the text into 'words', where a word is a string of non-blanks, eg (half-baked,)
  • for each word, divide it into
    • a beginning, comprising zero or more non-letters followed by a single letter, eg (h
    • an end, comprising a letter followed by zero or more non-letters eg d,)
    • a middle, which is everything between the beginning and end and may comprise letters and non-letters, eg alf-bake
  • extract just the letters from the middle, eg alfbake
  • swap them around randomly resulting in eg flbakea
  • replace each letter in middle with its swapped version,
    eg flb-akea
  • concatenate the beginning, the middle and the end,
    eg (hflb-akead,)
  • and add that and a space to the result.

That leaves any embedded non-letters where they are, but jumbles the letters before and after them.

Conclusions

If the text mainly comprises common words with up to six or seven letters, then I would say the assertion that it is easily read is true. However, with longer words, unfamiliar words and technical words it becomes steadily less comprehensible.

I tried German, French, Greek and Russian examples (see below). These demonstrate that Perl's \w correctly matches letters with accents and letters in (at least some) non-Latin alphabets. I would say that the comprehensibility of the French and German phrases is about the same as the English ones, but German is given to long compound words which will probably be incomprehensible when scrambled. I don't speak Greek or Russian, so can't comment on those.

Try it 

Try running the script with any input:



example: An Internet legend dating back to at least 2001 goes something like this

Script


#!/usr/bin/perl

# Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge

use v5.26;    # The Weekly Challenge - 2024-09-30
use utf8;     # Week 289 - task 2 - Jumbled letters
use warnings; # Peter Campbell Smith
binmode STDOUT, ':utf8';

jumbled_letters(qq[The quick brown fox jumps over the lazy dog.]);
jumbled_letters(qq[The X-factor's inventor was Bloggs-Jones, who said 'Hello!']);
jumbled_letters(qq[Psychological Abstracts contains nonevaluative abstracts of literature in psychology and related disciplines, grouped into 22 major classification categories]);
jumbled_letters(qq[Deoxyribonucleic acid, mucopolysaccharides and propan-2-ol are organic chemicals.]);
jumbled_letters(qq[Das Mädchen möchte die Straẞe früh überqueren.]);
jumbled_letters(qq[L'accent circonflexe va disparaître des manuels scolaires à la rentrée: que s'est-il passé, et qu'en pense le ministre de l'Éducation nationale Najat Vallaud Belkacem?]);
jumbled_letters(qq[Η γρήγορη καφετιά αλεπού πηδά πάνω από το τεμπέλικο σκυλί]);
jumbled_letters(qq[Быстрая бурая лиса перепрыгивает через ленивую собаку.]);

sub jumbled_letters {
    
    my ($str, $before, $rest, $middle, $after, $one, $two, $x, $letters, $count, $length, $word, $lm, $s, $m, $jumbled);
    
    $str = $_[0] . ' ';
    $jumbled = '';
    
    # loop over 'words'
    while ($str =~ m|([^\s]*)\s+|gi) {
        
        # split word into $before, $middle and $after
        $word = $1;
        if ($word =~ m|\w| and length($word) >= 4) {
            ($before, $rest) = $word =~ m|([^\w]*\w)(.*)|; 
            ($middle, $after) = $rest =~ m|(.*?)(\w[^\w]*)$|;

            # put just the letters (\w) into letters
            $lm = length($middle);                      
            $letters = $middle;
            $letters =~ s|[^\w]||g;
            $count = length($letters);
            
            # swap letters around randomly lots of times
            if ($count > 1) {
                for (0 .. $count + rand(7)) {
                    do {
                        $one = int(rand($count));
                        $two = int(rand($count));
                    } until $one != $two;
                    $x = substr($letters, $one, 1);
                    substr($letters, $one, 1) = substr($letters, $two, 1);
                    substr($letters, $two, 1) = $x;
                }
                
                # now put the jumbled letters in place of the originals
                $s = 0;
                for $m (0 .. length($middle) - 1) {
                    if (substr($middle, $m, 1) =~ m|\w|) {
                        substr($middle, $m, 1) = substr($letters, $s ++, 1);
                    }
                }
            }
            
            # reassemble the word
            $word = $before . $middle . $after;
        }
        
        # and add it to the jubled output
        $jumbled .= $word . ' ';
    }
    
    say qq[\nInput:  $str];
    say qq[Output: $jumbled];
}

Output


Input:  The quick brown fox jumps over the lazy dog.
Output: The qcuik brown fox jmups oevr the lazy dog.

Input:  The X-factor's inventor was Bloggs-Jones, who 
   said 'Hello!'
Output: The X-cafort's invoetnr was BsngJl-ooges, who 
   siad 'Hlleo!'

Input:  Psychological Abstracts contains nonevaluative 
   abstracts of literature in psychology and related 
   disciplines, grouped into 22 major classification 
   categories
Output: Poihcsocalygl Atabtrscs cnitoans noavavnlteiue 
   aatsbtcrs of lteaitrure in pcyolgsohy and retaled 
   depiiisncls, geuprod into 22 maojr clctfoissiiaan 
   cgaeiortes

Input:  Deoxyribonucleic acid, mucopolysaccharides and 
   propan-2-ol are organic chemicals.
Output: Dinooeuilyrcebxc aicd, mlaiadcrhoocycsueps and 
   ppanor-2-ol are onagirc clmiahces.

Input:  Das Mädchen möchte die Straẞe früh überqueren.
Output: Das Mhcäedn mtcöhe die Sẞrate früh üererubqen.

Input:  L'accent circonflexe va disparaître des manuels 
   scolaires à la rentrée: que s'est-il passé, et qu'en 
   pense le ministre de l'Éducation nationale Najat 
   Vallaud Belkacem?
Output: L'acenct croxlecfine va dtrpiîarsae des maleuns 
   solaciers à la rétrnee: que s'sti-el pssaé, et qe'un 
   pnsee le mtisinre de l'tiacuÉdon nnitaloae Njaat 
   Vaualld Balecekm?

Input:  Η γρήγορη καφετιά αλεπού πηδά πάνω από το 
   τεμπέλικο σκυλί
Output: Η γροήργη κιατεφά αεπολύ πηδά πνάω από το 
   τειλκέμπο συλκί

Input:  Быстрая бурая лиса перепрыгивает через ленивую 
   собаку.
Output: Баысртя буаря лсиа пыреапвигреет через лвиуеню 
   собаку.
 

 

Any content of this website which has been created by Peter Campbell Smith is in the public domain