Search matrix and culprit
Weekly challenge 111 — 3 May 2021
Week 111: 3 May 2021
Given a word, you can sort its letters alphabetically (case insensitive). For example, “beekeeper” becomes “beeeeekpr” and “dictionary” becomes “acdiinorty”.
Write a script to find the longest English words that don’t change when their letters are sorted.
Example 1 access
First find your words.
There are a few internet sites that offer 'n English words' including github.com/dwyl/english-word that lists 479,000 of them. The Oxford English Dictionary reckons that there are only (!) about 179,000 words currently in use, so the dwyl list clearly contains a lot of non-words - abbreviations, initialisms, misprints and so on, as is clear on inspection. The first few words, for example are: 'aaa aah aahs aal aals aam aaru aas abb abbe abbey' of which only 'abbey' is a word in common use.
The next list I tried is from MIT and is 10,000 words long. Sadly, that too contains many non-words, but it seems to have a somwhat better fraction of real words.
Lastly, I took some real text: the novel 'The Sign of the Four', one of Arthur Conan Doyle's Sherlock Holmes books. It was published in 1890, so is a little dated, but more importantly is out of copyright. Most of its unique 5753 words are still in current English usage.
So now, how do we find the ones which have their letters in alphabetical order? It's relatively expensive - if you have to do it 479,000 times - to sort every word into alphabetical order, so I filtered each list by:
That eliminated about 40% of each list.
The MIT and Conan Doyle lists each have about 2% of their unique words in alphabetical order, with the dwyl list having only 0.3%. I am inclined to conclude that 'opt' is the alphabetically last proper word to meet the condition, though the dwyl list includes 'tux', a commonly-used US abbreviation of 'tuxedo' - a dinner jacket in British English.
And as for the longest, all the lists have a few 6-letter words including accent, accept, access, almost and effort. Dwyl does suggest the 8-letter 'aegilops', which is a genus of Eurasian and North American plants in the grass family, but genus names are Latin so I don't think it counts.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2021-05-03 use utf8; # Week 111 - task 2 - Ordered letters use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; use Encode; ordered_letters('../data/dwyl.txt'); ordered_letters('../data/MIT10000.txt'); ordered_letters('../data/conan_doyle.txt'); sub ordered_letters { my ($F, $word, $sorted, $s, @l, $j, $length, @count, $line, @text, $file, $longest); # initialise $file = $_[0]; open ($F, '<:utf8', $file) or die 'cannot open word list'; $line = ''; $longest = ''; say qq[\nWords from: $file]; # loop over words while ($word = <$F>) { chomp $word; $word = lc($word); $count[0] ++; # last letter must come after 1st next unless substr($word, 0, 1) le substr($word, -1); $count[1] ++; # must contain at least one aeiouy next unless $word =~ m|[aeiouy]|; $count[2] ++; # at least 3 letters $length = length($word); next if $length < 3; $count[3] ++; # now check whether letters in alpha order $sorted = join('', sort (split('', $word))); next unless $word eq $sorted; $count[4] ++; # show words if (length($line . $word) > 115) { say $line; $line = ''; } $line .= $word . ' '; $longest = $word if length($word) > length($longest); } close $F; say $line if $line; say ''; # stats @text = ('In dictionary', 'Last after first', 'Have vowel or y', 'At least 3 letters', 'In alphabetical order'); for $j (0 .. 4) { printf(qq[%26s: %6d = %3.2f%s\n], $text[$j], $count[$j], $count[$j] * 100 / $count[0], '%'); } say qq[ Longest alphabetical word: $longest]; }
35 lines of code
I completed this challenge after the closing date
and it has not
been submitted to GitHub
Words from: ../data/dwyl.txt
aaa aah aahs aal aals aam aaru aas abb abbe abbey abbes abbess abbest abby abbot abbott abbr abc abcess abd abdest
abe abey abel aberr abet abhor abhors aby abilo abir abit abl ably ablow abn abo abort abos abow abox abp abr abs
abt abu abuzz abv acc acce accel accent accept access accloy accoy accost acct ace acer aces ach achy achill achoo
achor acy acis ack acknow aclu acop acor acost acpt acrux act actu add addeem adder adders addy addio addis addn
addr adds addu ade adeem adeems adeep adelops adempt aden adeps adept adet adfix adhort ady adin adios adipsy adit
adj adjt adm admov admrx ado adoors adopt ador ados adoxy adp adry ads adv adz aegilops aegir aegis aeq aer aery
aes aet aff affy affix afflux afft aflow afoot aft aggry agy agin agio agios agist agit agly aglow agnosy ago agos
agr agst agt ahi ahint ahir aho ahoy ahs aht ahu aik ail aillt ails aim aims ain ains aint ainu air airy airs airt
ais ait aix aknow ako akov aku aly all ally allo alloy alloo alloquy allot allow alloxy alls almost alms aln alo
alop alow alp alps als alt alw amy ammo ammos ammu amoy amor amort amos amp amps amt amu any ann anno annoy annot
anopsy ans ansu ant antu aor app appt apr apt apx ary arr arry ars art arty aru arx ass asst ast att atty aux bde
bee beef beefy beefily beefin beefs beek been beent beep beeps beer beery beers bees beest beet beety bef befist
befit befop beg begin begins begirt bego begorry begot begs behint behn behoot bey bein bekko beknot beknow bel
bely bell belly bello belloot bellow bells below bels belt ben benn benny beno bens bent benty benu ber berry bert
bes bess bessy best bet betty bevy bevvy bhikku bhil bhoy bhoot bijou bijoux bill billy billot billow billowy bills
bilo bilos bim bin binny bino bins bint bio biopsy bios birr birrs birsy birt bis bist bit bitt bitty bivvy biz
bizz blo bloop bloops blot blotty blow blowy boy boo boor boors boort boos boosy boost boot booty bop bops bor bors
bort borty bortz bos boss bossy bot bott bottu bouw bow box boxy bruzz btu buy buz buzz ccitt cee cees ceil ceils
ceint cell cello cellos cells celt cen cent cep ceps cert certy cess cest cestuy cfi chi chil chill chilly chillo
chills chimp chimps chimu chin chinny chino chinos chins chint chintz chiot chip chippy chips chirr chirrs chirt
chiru chis chit chitty chiv chivy chivvy chivw chizz chlor cho choy choop choosy chop choppy chops chort chott chou
choux chow chry cill cir cis cissy cist cit city civ civy civvy cly clo cloy cloop cloot clop clops clos clot
clotty clou clow coy coo coop coops coopt coos coost coot cooty cop copy coppy copps copr cops copsy copt cor cory
corr corsy cort corv cos cosy coss cost cot cott cotty cow cowy cox coxy coz cpu cry cru crux cuvy dee deek deem
deems deeny deep deeps deer deers dees deess def defi defy defis defix deflow deflux defs deft deg deglory degu
dehors dehort dei dey deil deils deimos deino deinos deis deist deity dekko dekkos del dely dell delly dells deloo
dels dem demy demo demos demot den deny dens dent denty dep depr dept der derry derv des dess det deux dev dew dewy
dex dhikr dhikrs dhoty dhow dhu dikkop dil dill dilly dills dilo dilos diluvy dim dimmy dimps dimpsy dims din dino
dinos dins dint dioxy dip dippy dipppy dips dipsy dipt dir dirt dirty dis diss dist disty dit ditt ditty div divvy
dix dixy dizz dlvy doo door doors dop dopy dor dory dorr dorrs dors dort dorty dos doss dossy dost dot doty dotty
doux dow dowy doxy doz dry druxy dux eel eely eels een eer eery eff efflux effort effs efik efl efs eft egg egghot
eggy eggs egilops egis ego egos eir eiry ejoo ell ellops ells elm elmy elms elops els elt emm emmy emory emp empt
empty ems emu ennoy enos enow ens env envy eos eppy err errs ers erst ess est esu ety fil fill filly fills film
filmy films filo fils filt fin finn finny finns fino fins fiot fip fir firy firry firs first fist fisty fit fitty
fitz fix fiz fizz fly flo floor floors floosy flop floppy flops flor flory floss flossy flot flow flu flux foy foo
foot footy fop foppy fops for forst fort forty forz foss fot fou fow fox foxy fry fruz frwy fuzz ghi ghis ghost
ghosty ghuz gil gill gilly gillot gills gilo gilpy gils gilt gilty gim gimmor gimp gimpy gimps gin ginn ginny gins
gio gip gippy gips gipsy girr girt gis gist git gizz gloy glop gloppy glops glor glory glos gloss glossy glost glow
gnow gnu goy goo goop goopy goops goos goosy gor gory gorry gorsy gorst gos goss gossy got gou gov gox gry guy guv
guz hill hilly hillo hillos hills hilt him himp hin hinny hins hint hip hippy hips hir hirst his hiss hissy hist
hit hizz hny hoy hoo hoop hoops hoot hop hoppy hops hor hory horry hors horsy horst hort hoss host hot how hox huzz
hvy hwy iii ijo ill illy ills ilot immy immov imp impy imps impv imu inn inns ino ins inst int inv ios iou ipr ips
iqs irs ist isz ivy joy jos joss jot jotty jovy jow juv klop klops knop knoppy knops knorr knot knotty know knox
kop kops kor kory kors kos koss kou kru loy loo loop loopy loops loory loos loot lop loppy lops loq lor lory lorry
lors loss lossy lost lot lou low lowy lox lux moy moo moop moor moory moors moos moost moot mop mopy moppy mops
mopsy mor mors mort morw mos moss mossy most mot mott motty mou mow mru mux muzz noy noo noop nor norry nos nosy
nosu not nou nov now nowy nox oooo oops oory oos oot opp ops opsy opt ory ors ort oxy pry pty puy puxy qty rux rwy
ssu sty stu suu suz swy tty tuy tux xyz
In dictionary: 370105 = 100.00%
Last after first: 213920 = 57.80%
Have vowel or y: 213536 = 57.70%
At least 3 letters: 213423 = 57.67%
In alphabetical order: 1060 = 0.29%
Longest alphabetical word: aegilops
Words from: ../data/MIT10000.txt
aaa abc abs abu acc accent accept access ace acer acm act add adds adopt ads ago aim aims air all allow alloy
almost alot alt amp amy ann ant any app apps apr apt art ass bee beef been beer begin begins bell belly below belt
ben benz berry best bet betty bill bills billy bin bio bios bit biz blow boost boot booty boss bow box boy buy buzz
cell cells cent ceo cest cet cgi chi chip chips cho cio cir city cop copy cos cost cow cox cpu cruz cry dee deep
deer def del dell dem demo den deny dept der des dev dim dip dir dirt dirty dis dist div divx diy door doors dos
dot dow dry effort egg eggs empty ent eos est fill film films fin first fist fit fix floor floors floppy flow flu
flux fly foo foot for fort forty fox ghost gis glory glow gnu got gov guy hill hills him hint hip his hist hit hiv
hop host hot how hwy iii ill inn inns ins int ips irs ist joy know loop loops los loss lost lot lou low moss most
nor nos not nov now ooo oops opt pty qty
In dictionary: 10000 = 100.00%
Last after first: 6262 = 62.62%
Have vowel or y: 6059 = 60.59%
At least 3 letters: 5958 = 59.58%
In alphabetical order: 205 = 2.05%
Longest alphabetical word: accent
Words from: ../data/conan_doyle.txt
all below best city denn dirty door dost elms first for got his how iii lor lost most not now abhor accept act
afoot ago ain air airs all allow ally almost annoy any art been bees begin begins bein bell below bent best bet bit
blow boot bow box boy cell ceux chill chills chin chins city cost cry deep del deny der des dip dips dir dirt dirty
door doors dry effort eggs empty err fir first fit floor foot for fort forty girt glow got guv hill him his hit
hops hot how ill imp joy know loot lop loss lost lot low moss most nor not now
In dictionary: 5753 = 100.00%
Last after first: 3403 = 59.15%
Have vowel or y: 3384 = 58.82%
At least 3 letters: 3338 = 58.02%
In alphabetical order: 111 = 1.93%
Longest alphabetical word: accept
Any content of this website which has been created by Peter Campbell Smith is in the public domain