Peter
Peter Campbell Smith

Special and frequent numbers

Weekly challenge 195 — 12 December 2022

Week 195 - 12 Dec 2022

Task 1

Task — Special integers

You are given a positive integer, $n > 0. Write a script to print the count of all special integers between 1 and $n. An integer is special when all of its digits are unique.

Examples


Example 1:
Input: $n = 15
Output: 14 as except 11 all other integers between 1 and 
   15 are special.

Example 2:
Input: $n = 35
Output: 32 as except 11, 22, 33 all others are special.

Analysis

Let's start by thinking of a clever way to quickly identify special numbers in a given range. If you've come up with that - well done! After trying a number of leads I came to the conclusion that none was much better - and most were more complicated - than simply testing all the numbers in the range.

How do you test a number for being special? This was my initial thought:

$is_special = 1;
for $digit (0..9) {
    if ($j =~ m|$digit.*$digit|) {
        $is_special = 0;
        last;
    }
}

The regular expression in there says if there is a digit, followed by anything or nothing, followed by the same digit again, then $j is not special. So we can wrap that in a loop of $j going from 1 to $n and count the specials.

And that does work. But if $n is large, it takes a while. On my (quite slow) computer using that method (method 1), if $n is 1 million, it takes 55 seconds, and for 5 million, 223 seconds. So I applied a couple of techniques to speed it up.

The first is to note that as we go from 1 to $n we will hit some numbers with trailing zeroes, such as 10, 200, 55000 and so on. Let's split these numbers into the bit before the zeroes (call it A), and the zeroes themselves (B).

So here is my observation: if A isn't special then we can quickly jump to the number that starts with A + 1 and follows that with B. So for example when checking 55000, we can skip 55001 to 55999 (because they all start with that unspecial 55) and continue at 56000. So my loop now looks like this:

$j = 0;
while (1) {
    $j ++;
    last if $j > $test;
    if ($j =~ m|(.+?)(0+)$|) {
        $j = ($1 + 1) * 10 ** length($2) - 1 
          if unspecial($1);
    }
    $count ++ unless unspecial($j);	
}

So for my 55000 example, the first if splits that into $1 = 55 and $2 = 000, and the next lines jumps $j to 56 * 10^3 - ie 56000 (actually to 55999, because the loop increments $j at the top.) You'll note too that I have taken the test for specialness out to a subroutine, but it's the same method as I showed as my first code example above. Doing that roughly halves the time: for 1 million it now takes 24 seconds. But that's still quite a long time to wait. So for my next trick I unfolded sub unspecial into 10 lines such as:

return 1 if $_[0] =~ m|1.*1|;

Now you might think 'why do that?' Well, my first answer is that for $n = 5 million, this (method 2) now takes just 7 seconds as opposed to 223 seconds for method 1, so clearly it's worth it.

But why that's so much more efficient lies in the guts of Perl, about which I know very little. I am assuming that there is a lot of overhead in working with a regular expression containing a variable like m|$digit.*$digit| because the regular expression has to be reconsidered 5 million times at run time, whereas in the unfolded version it can be done just once at compile time.

So all in all, a very interesting task.

Try it 

Try running the script with any input:



example: 54321 (max of 1000000, please)

Script


#!/usr/bin/perl

# Peter Campbell Smith - 2022-12-12
# PWC 195 task 1

use v5.28;
use utf8;
use warnings;

my (@tests, $test, $count, $j, $digit, $start);
@tests = (15, 34, 77, 123, 1234, 12345, 1000000, 5000000);

# loop over tests
say qq[\n--- METHOD 1 ---];
for $test (@tests) {
    
    # loop over 1 to $n
    $start = time;
    $count = 0;
    TEST: for $j (1 .. $test) {
        
        # check for 2 of each digit
        for $digit (0..9) {
            
            # don't count if digit repeated
            next TEST if $j =~ m|$digit.*$digit|;
        }
        
        # is good
        $count ++;
    }
    say qq[\nInput:  $test\nOutput: $count (] . (time - $start) . qq[ secs)];
}

# loop over tests
say qq[\n--- METHOD 2 ---];
for $test (@tests) {
    
    # loop over 1 to $n
    $start = time;
    $count = 0;
    $j = 0;
    
    while (1) {
        
        $j ++;
        last if $j > $test;
        if ($j =~ m|(.+?)(0+)$|) {
            $j = ($1 + 1) * 10 ** length($2) - 1 if unspecial($1);
        }
        $count ++ unless unspecial($j); 
    }
    say qq[\nInput:  $test\nOutput: $count (] . (time - $start) . qq[ secs)];
}

sub unspecial {
    
    # returns 0 if special, 1 if not
    return 1 if $_[0] =~ m|1.*1|;
    return 1 if $_[0] =~ m|2.*2|;
    return 1 if $_[0] =~ m|3.*3|;
    return 1 if $_[0] =~ m|4.*4|;
    return 1 if $_[0] =~ m|5.*5|;
    return 1 if $_[0] =~ m|6.*6|;
    return 1 if $_[0] =~ m|7.*7|;
    return 1 if $_[0] =~ m|8.*8|;
    return 1 if $_[0] =~ m|9.*9|;
    return 1 if $_[0] =~ m|0.*0|;

    return 0;
}

Output


--- METHOD 1 ---

Input:  15
Output: 14 (0 secs)

Input:  34
Output: 31 (0 secs)

Input:  77
Output: 70 (0 secs)

Input:  123
Output: 100 (0 secs)

Input:  1234
Output: 803 (0 secs)

Input:  12345
Output: 5660 (1 secs)

Input:  1000000
Output: 168570 (55 secs)

Input:  5000000
Output: 410490 (223 secs)

--- METHOD 2 ---

Input:  15
Output: 14 (0 secs)

Input:  34
Output: 31 (0 secs)

Input:  77
Output: 70 (0 secs)

Input:  123
Output: 100 (0 secs)

Input:  1234
Output: 803 (0 secs)

Input:  12345
Output: 5660 (0 secs)

Input:  1000000
Output: 168570 (2 secs)

Input:  5000000
Output: 410490 (7 secs)

Note: the running times above are on my slow development computer and are not comparable with those generated by 'Try it', which run on my external webserver.