Peter
Peter Campbell Smith

Dot product and palindromes

Weekly challenge 145 — 27 December 2021

Week 145 - 27 Dec 2021

Task 2

Task — Palindromic tree

You are given a string $s. Write a script to create a Palindromic Tree for the given string. I found this blog which explains Palindromic Tree in detail.

Examples


Example 1:
Input: $s = 'redivider'
Output: r redivider e edivide d divid i ivi v

Example 2:
Input: $s = 'deific'
Output: d e i ifi f c

Example 3:
Input: $s = 'rotors'
Output: r rotor o oto t s

Example 4:
Input: $s = 'challenge'
Output: c h a l ll e n g

Example 5:
Input: $s = 'champion'
Output: c h a m p i o n

Example 6:
Input: $s = 'christmas'
Output: c h r i s t m a

Analysis

I found the analysis in the referenced paper a little hard to follow, but maybe that was down to having eaten too much over Christmas. So I developed my own algorithm which at least gives the right answer.

I generated all possible substrings from the given string. So for a string of length n there are n substrings which are 1 character long, n-1 which are 2 characters long and so on up to 1 string which is n characters long. So there are n * (n + 1) / 2 substrings in all. For example, if the string is 'perl' the 10 substrings are p pe per perl e er erl r rl l.

Then I looped over these and checked (a) is it palindromic and (b) have I seen it before. If the answer is (a) yes and (b) no, I record it as a valid item for the answer.

The slightly tricky bit is the 'is it palindromic' question, because that's going to be asked n * (n + 1) / 2 times. My first attempt at the is_palindromic function reversed the supplied substring and compared it with the original. That works, but isn't very fast.

But since most substrings of real words are not palindromic, I reckoned it would pay to optimise the detection of those. I did it like this:

while ($string =~ s|^(.)(.*)(.)$|$2|g) {
	return 0 if $1 ne $3;
}
return 1;

The s||| construct isolates the first and last characters as $1 and $3, and the middle bit as $2. If $1 and $3 differ, the string can't be a palindrome and we can immediately return false. If $1 and $3 are the same, the while loop repeats looking at what were originally the second and second last characters, and so on.

If the original string is a palindrome and has an even number of characters, the last successful iteration of the while will result in $2 being an empty string, which will then fail to match the s||| the next time and the function will return true. If it is a palindrome and has an odd number of characters then the last $2 will be a single character which will again fail to match the s||| because $1 and $3 have to be distinct characters, and any single character is of course a palindrome.

So that's very neat in Perl and pretty efficient: compared to reversing the whole string and comparing it with the original it takes about 85% less time for my 1000 character test string. But it doesn't involve creating the actual palindromic tree.

Try it 

Try running the script with any input:



example: racecar

Script


#!/usr/bin/perl

# Peter Campbell Smith - 2021-12-27
# PWC 145 task 2

use v5.20;
use warnings;
use strict;

my (@strings, $string, $length, $sub_length, $start_at, $test, %seen, $very_long, $j);

# words
@strings = qw[redivider deific rotors challenge champion christmas 
    supercalifragilisticexpialidocious antidisestablishmentarianism abcdedcbfffffgfffffzz];

# add a random 1000-char string
$very_long = '';
for $j (0 ..999) {
    $very_long .= chr(ord('a') + rand(26));
}
push @strings, $very_long;

# loop over test strings
for $string (@strings) {
    print qq[\nInput:  $string\nOutput: ];
    %seen = ();
    
    # generate all possible substrings of $string
    $length = length($string);
    for $start_at (0 .. $length - 1) {
        for $sub_length (1 .. $length - $start_at) {
            
            # check if palindromic and not already seen
            $test = substr($string, $start_at, $sub_length);
            print qq[$test ] if (not $seen{$test} and is_palindromic($test));
            $seen{$test} = 1;
        }
    }
    say '';
}

sub is_palindromic {   #  ($string) and returns true/false if $string is/isn't palindromic

    my $string = $_[0];
    
    # compare 1st and last characters, and if the same, strip them off and repeat
    while ($string =~ s|^(.)(.*)(.)$|$2|g) {
        return 0 if $1 ne $3;
    }
    return 1;   
}

Output


Input:  redivider
Output: r redivider e edivide d divid i ivi v 

Input:  deific
Output: d e i ifi f c 

Input:  rotors
Output: r rotor o oto t s 

Input:  challenge
Output: c h a l ll e n g 

Input:  champion
Output: c h a m p i o n 

Input:  christmas
Output: c h r i s t m a 

Input:  supercalifragilisticexpialidocious
Output: s u p e r c a l i f g ili t x d o 

Input:  antidisestablishmentarianism
Output: a n t i idi d s ses e b l h m r 

Input:  abcdedcbfffffgfffffzz
Output: a b bcdedcb c cdedc d ded e f ff fff ffff fffff fffffgfffff ffffgffff fffgfff ffgff fgf g z zz 

Input:  anjhihmzeaxltuxxlvekpfldcfdxygjpwzoptucklyzaeqdiyccvjiktaunutviemfdcpnhiajymntdcebvwwcdquxbvlnkeesoezstnnmbebtxvyqupgxqqltmqvodcpqkvdppizzefjrnxlmjnnukhtltzwomwrbcqtycjhgcxzujrogmizjrcdhuilwsxfzagvetdenhsbuyzsopxtjewdfvjeemyzqmhvenbxtmhzxjzqysqkvpgfsyfoeecjbxufztendtmtwacoouxvgkbzkiyffqqnkvkcsdmyjptowuisysarsqkrdqpoxxzxrkuprksmzfodvbkcnylrzzokjdlhfgonriuexxzfwabhesujaiyenxdhkvupcsktqlcmnctrfgxztbfdujsmipljsnezpkihqtjkovnqotcadbdqzjavzqevcuzyumkdmhywuqddbfdwdasimorpjcvkhvjewduqdbecapqkgajhavncgxppmkyoulrrjvlmdabjdlwgxthdinimonooeqztrqjjgwxshhrxxfgjauevwvaofgnhrsqbnhenfaswebxeiyuugyxeglukswwpkveqxxrcgvjybxboucdrpftniotscxfnhwpzqbleskppxzlibblbgeoulditvqmhmemaesnybfhgkfdwmiztrvwjpmgmpqyylbwjuydogqrddqtiklbnagjcfoteseciswbfdvgkmafqawftaeshuclaxhuxtihwvrxegazptsrftjlazfbptjxidbnvudytwqjdguvispcxkaixoxyxaitffwgesizymwrxjbfbjrvfbrppvsppmrhhvxhbhzojzbhrnkigabdpvnsjasonotkstseadvyeolypkcnvfbmqtyuxhwldwcrnvpeiroaefwkbfyalulaixcomhczmtzcmluydruoeqwxbnzagvhwhsqodghbnhgrjzyghcyjpgwvacfmsypiehkyfhgc
Output: a n j h hih i m z e x l t u xx v k p f d c y g w o q cc unu b ww ee s nn beb qq pp zz r tlt tmt oo ff kvk sys xzx dbd dd dwd rr ini ono jj hh vwv uu bxb bb blb mhm mem pmgmp mgm yy ese xox xyx jbfbj bfb hbh sts alula lul hwh