Peter’s blog ✴ Week 111 ✴ 3 May 2021
THE WEEKLY CHALLENGE
Search matrix and culprit
Given a word, you can sort its letters alphabetically (case insensitive). For example, “beekeeper” becomes “beeeeekpr” and “dictionary” becomes “acdiinorty”.
Write a script to find the longest English words that don’t change when their letters are sorted.
Example 1 access
First find your words.
There are a few internet sites that offer 'n English words' including github.com/dwyl/english-word that lists 479,000 of them. The Oxford English Dictionary reckons that there are only (!) about 179,000 words currently in use, so the dwyl list clearly contains a lot of non-words - abbreviations, initialisms, misprints and so on, as is clear on inspection. The first few words, for example are: 'aaa aah aahs aal aals aam aaru aas abb abbe abbey' of which only 'abbey' is a word in common use.
The next list I tried is from MIT and is 10,000 words long. Sadly, that too contains many non-words, but it seems to have a somwhat better fraction of real words.
Lastly, I took some real text: the novel 'The Sign of the Four', one of Arthur Conan Doyle's Sherlock Holmes books. It was published in 1890, so is a little dated, but more importantly is out of copyright. Most of its unique 5753 words are still in current English usage.
So now, how do we find the ones which have their letters in alphabetical order? It's relatively expensive - if you have to do it 479,000 times - to sort every word into alphabetical order, so I filtered each list by:
That eliminated about 40% of each list.
The MIT and Conan Doyle lists each have about 2% of their unique words in alphabetical order, with the dwyl list having only 0.3%. I am inclined to conclude that 'opt' is the alphabetically last proper word to meet the condition, though the dwyl list includes 'tux', a commonly-used US abbreviation of 'tuxedo' - a dinner jacket in British English.
And as for the longest, all the lists have a few 6-letter words including accent, accept, access, almost and effort. Dwyl does suggest the 8-letter 'aegilops', which is a genus of Eurasian and North American plants in the grass family, but genus names are Latin so I don't think it counts.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2021-05-03 use utf8; # Week 111 - task 2 - Ordered letters use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Encode; ordered_letters('../data/dwyl.txt'); ordered_letters('../data/MIT10000.txt'); ordered_letters('../data/conan_doyle.txt'); sub ordered_letters { my ($F, $word, $sorted, $s, @l, $j, $length, @count, $line, @text, $file, $longest); # initialise $file = $_[0]; open ($F, '<:utf8', $file) or die 'cannot open word list'; $line = ''; $longest = ''; say qq[\nWords from: $file]; # loop over words while ($word = <$F>) { chomp $word; $word = lc($word); $count[0] ++; # last letter must come after 1st next unless substr($word, 0, 1) le substr($word, -1); $count[1] ++; # must contain at least one aeiouy next unless $word =~ m|[aeiouy]|; $count[2] ++; # at least 3 letters $length = length($word); next if $length < 3; $count[3] ++; # now check whether letters in alpha order $sorted = join('', sort (split('', $word))); next unless $word eq $sorted; $count[4] ++; # show words if (length($line . $word) >= 120) { say $line; $line = ''; } $line .= $word . ' '; $longest = $word if length($word) > length($longest); } close $F; say $line if $line; say ''; # stats @text = ('In dictionary', 'Last after first', 'Have vowel or y', 'At least 3 letters', 'In alphabetical order'); for $j (0 .. 4) { printf(qq[%26s: %6d = %3.2f%s\n], $text[$j], $count[$j], $count[$j] * 100 / $count[0], '%'); } say qq[ Longest alphabetical word: $longest]; }
35 lines of code
Completed after the closing date and not submitted to GitHub
Words from: ../data/dwyl.txt
aaa aah aahs aal aals aam aaru aas abb abbe abbey abbes abbess abbest abby abbot abbott abbr abc abcess abd abdest abe
abey abel aberr abet abhor abhors aby abilo abir abit abl ably ablow abn abo abort abos abow abox abp abr abs abt abu
abuzz abv acc acce accel accent accept access accloy accoy accost acct ace acer aces ach achy achill achoo achor acy
acis ack acknow aclu acop acor acost acpt acrux act actu add addeem adder adders addy addio addis addn addr adds addu
ade adeem adeems adeep adelops adempt aden adeps adept adet adfix adhort ady adin adios adipsy adit adj adjt adm admov
admrx ado adoors adopt ador ados adoxy adp adry ads adv adz aegilops aegir aegis aeq aer aery aes aet aff affy affix
afflux afft aflow afoot aft aggry agy agin agio agios agist agit agly aglow agnosy ago agos agr agst agt ahi ahint ahir
aho ahoy ahs aht ahu aik ail aillt ails aim aims ain ains aint ainu air airy airs airt ais ait aix aknow ako akov aku
aly all ally allo alloy alloo alloquy allot allow alloxy alls almost alms aln alo alop alow alp alps als alt alw amy
ammo ammos ammu amoy amor amort amos amp amps amt amu any ann anno annoy annot anopsy ans ansu ant antu aor app appt
apr apt apx ary arr arry ars art arty aru arx ass asst ast att atty aux bde bee beef beefy beefily beefin beefs beek
been beent beep beeps beer beery beers bees beest beet beety bef befist befit befop beg begin begins begirt bego
begorry begot begs behint behn behoot bey bein bekko beknot beknow bel bely bell belly bello belloot bellow bells below
bels belt ben benn benny beno bens bent benty benu ber berry bert bes bess bessy best bet betty bevy bevvy bhikku bhil
bhoy bhoot bijou bijoux bill billy billot billow billowy bills bilo bilos bim bin binny bino bins bint bio biopsy bios
birr birrs birsy birt bis bist bit bitt bitty bivvy biz bizz blo bloop bloops blot blotty blow blowy boy boo boor boors
boort boos boosy boost boot booty bop bops bor bors bort borty bortz bos boss bossy bot bott bottu bouw bow box boxy
bruzz btu buy buz buzz ccitt cee cees ceil ceils ceint cell cello cellos cells celt cen cent cep ceps cert certy cess
cest cestuy cfi chi chil chill chilly chillo chills chimp chimps chimu chin chinny chino chinos chins chint chintz
chiot chip chippy chips chirr chirrs chirt chiru chis chit chitty chiv chivy chivvy chivw chizz chlor cho choy choop
choosy chop choppy chops chort chott chou choux chow chry cill cir cis cissy cist cit city civ civy civvy cly clo cloy
cloop cloot clop clops clos clot clotty clou clow coy coo coop coops coopt coos coost coot cooty cop copy coppy copps
copr cops copsy copt cor cory corr corsy cort corv cos cosy coss cost cot cott cotty cow cowy cox coxy coz cpu cry cru
crux cuvy dee deek deem deems deeny deep deeps deer deers dees deess def defi defy defis defix deflow deflux defs deft
deg deglory degu dehors dehort dei dey deil deils deimos deino deinos deis deist deity dekko dekkos del dely dell delly
dells deloo dels dem demy demo demos demot den deny dens dent denty dep depr dept der derry derv des dess det deux dev
dew dewy dex dhikr dhikrs dhoty dhow dhu dikkop dil dill dilly dills dilo dilos diluvy dim dimmy dimps dimpsy dims din
dino dinos dins dint dioxy dip dippy dipppy dips dipsy dipt dir dirt dirty dis diss dist disty dit ditt ditty div divvy
dix dixy dizz dlvy doo door doors dop dopy dor dory dorr dorrs dors dort dorty dos doss dossy dost dot doty dotty doux
dow dowy doxy doz dry druxy dux eel eely eels een eer eery eff efflux effort effs efik efl efs eft egg egghot eggy eggs
egilops egis ego egos eir eiry ejoo ell ellops ells elm elmy elms elops els elt emm emmy emory emp empt empty ems emu
ennoy enos enow ens env envy eos eppy err errs ers erst ess est esu ety fil fill filly fills film filmy films filo fils
filt fin finn finny finns fino fins fiot fip fir firy firry firs first fist fisty fit fitty fitz fix fiz fizz fly flo
floor floors floosy flop floppy flops flor flory floss flossy flot flow flu flux foy foo foot footy fop foppy fops for
forst fort forty forz foss fot fou fow fox foxy fry fruz frwy fuzz ghi ghis ghost ghosty ghuz gil gill gilly gillot
gills gilo gilpy gils gilt gilty gim gimmor gimp gimpy gimps gin ginn ginny gins gio gip gippy gips gipsy girr girt gis
gist git gizz gloy glop gloppy glops glor glory glos gloss glossy glost glow gnow gnu goy goo goop goopy goops goos
goosy gor gory gorry gorsy gorst gos goss gossy got gou gov gox gry guy guv guz hill hilly hillo hillos hills hilt him
himp hin hinny hins hint hip hippy hips hir hirst his hiss hissy hist hit hizz hny hoy hoo hoop hoops hoot hop hoppy
hops hor hory horry hors horsy horst hort hoss host hot how hox huzz hvy hwy iii ijo ill illy ills ilot immy immov imp
impy imps impv imu inn inns ino ins inst int inv ios iou ipr ips iqs irs ist isz ivy joy jos joss jot jotty jovy jow
juv klop klops knop knoppy knops knorr knot knotty know knox kop kops kor kory kors kos koss kou kru loy loo loop loopy
loops loory loos loot lop loppy lops loq lor lory lorry lors loss lossy lost lot lou low lowy lox lux moy moo moop moor
moory moors moos moost moot mop mopy moppy mops mopsy mor mors mort morw mos moss mossy most mot mott motty mou mow mru
mux muzz noy noo noop nor norry nos nosy nosu not nou nov now nowy nox oooo oops oory oos oot opp ops opsy opt ory ors
ort oxy pry pty puy puxy qty rux rwy ssu sty stu suu suz swy tty tuy tux xyz
In dictionary: 370105 = 100.00%
Last after first: 213920 = 57.80%
Have vowel or y: 213536 = 57.70%
At least 3 letters: 213423 = 57.67%
In alphabetical order: 1060 = 0.29%
Longest alphabetical word: aegilops
Words from: ../data/MIT10000.txt
aaa abc abs abu acc accent accept access ace acer acm act add adds adopt ads ago aim aims air all allow alloy almost
alot alt amp amy ann ant any app apps apr apt art ass bee beef been beer begin begins bell belly below belt ben benz
berry best bet betty bill bills billy bin bio bios bit biz blow boost boot booty boss bow box boy buy buzz cell cells
cent ceo cest cet cgi chi chip chips cho cio cir city cop copy cos cost cow cox cpu cruz cry dee deep deer def del dell
dem demo den deny dept der des dev dim dip dir dirt dirty dis dist div divx diy door doors dos dot dow dry effort egg
eggs empty ent eos est fill film films fin first fist fit fix floor floors floppy flow flu flux fly foo foot for fort
forty fox ghost gis glory glow gnu got gov guy hill hills him hint hip his hist hit hiv hop host hot how hwy iii ill
inn inns ins int ips irs ist joy know loop loops los loss lost lot lou low moss most nor nos not nov now ooo oops opt
pty qty
In dictionary: 10000 = 100.00%
Last after first: 6262 = 62.62%
Have vowel or y: 6059 = 60.59%
At least 3 letters: 5958 = 59.58%
In alphabetical order: 205 = 2.05%
Longest alphabetical word: accent
Words from: ../data/conan_doyle.txt
all below best city denn dirty door dost elms first for got his how iii lor lost most not now abhor accept act afoot
ago ain air airs all allow ally almost annoy any art been bees begin begins bein bell below bent best bet bit blow boot
bow box boy cell ceux chill chills chin chins city cost cry deep del deny der des dip dips dir dirt dirty door doors
dry effort eggs empty err fir first fit floor foot for fort forty girt glow got guv hill him his hit hops hot how ill
imp joy know loot lop loss lost lot low moss most nor not now
In dictionary: 5753 = 100.00%
Last after first: 3403 = 59.15%
Have vowel or y: 3384 = 58.82%
At least 3 letters: 3338 = 58.02%
In alphabetical order: 111 = 1.93%
Longest alphabetical word: accept
Any content of this website which has been created by Peter Campbell Smith is in the public domain