Peter
Peter Campbell Smith

Tricky partitions
and easy stats

Weekly challenge 172 — 4 July 2022

Week 172 - 4 Jul 2022

Task 2

Task — Five-number summary

You are given an array of integers. Write a script to compute the five-number summary of the given set of integers. The five-number summary comprises the minimum, 1st quartile, median, 3rd quartile and maximum. More information is given in this Wikipedia page.

Examples


Example 1: (from the Wikipedia page)
@array = (0, 0, 1, 2, 63, 61, 27, 13)
Five number summary = 0,  0.5, 7.5, 44, 63
                      min q1   mdn  q3  max

Analysis

To calculate the 5-number summary we first sort the array. The minimum and maximum values are now the first and last elements. If there is an odd number of elements in the array, the middle one is the median; if there is an even number then the average of the two elements that straddle the middle is the median.

For the quartiles, I have used Method 1 from this page, which appears to be the method used on the page referenced in the task statement. It is that if the array has an even number of members, then the quartiles are the medians of the first and second halves of the data, or if it has an odd number, they are the medians of the two sub-arrays left after the middle member is deleted.

Of course there are modules to do all this, but it's only a dozen lines of simple coding to do it from first principles.

Try it 

Try running the script with any input:



example: 3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5

Script


#!/usr/bin/perl

# Peter Campbell Smith - 2022-07-07
# PWC 172 task 2

use v5.28;
use strict;
use warnings;
use utf8;
binmode(STDOUT, ':utf8');

my (@tests, $test, @random, @sorted, $count, $median, $first_quartile, $third_quartile, @sub);

@tests = ([0, 0, 1, 2, 63, 61, 27, 13],
    [8, 42, -3, 0, 99, 66, 21, 100], 
    [1, 1, 1, 1, 1, 1, 1],
    [1, 2, 3, 4, 5, 6, 7, 8, 1000]);
push @random, int(rand(101)) for 0 .. 99;
push @tests, \@random;

# loop over tests
for $test (@tests) {
    
    # sort numerically and count
    @sorted = sort {$a <=> $b} @$test;
    $count = scalar @sorted;

    # determine value at a position (which might not be integral)
    $median = get_median(\@sorted, ($count - 1) / 2);
    if (($count & 1) == 0) {
        @sub = @sorted[0 .. ($count / 2 - 1)];
        $first_quartile = get_median(\@sub, ($count / 2 - 1) / 2);
        @sub = @sorted[($count / 2) .. ($count - 1)];
        $third_quartile = get_median(\@sub, ($count / 2 - 1) / 2);
    } else {
        @sub = @sorted[0 .. (($count - 1)/ 2) - 1];
        $first_quartile = get_median(\@sub, (($count - 1) / 2 - 1) / 2);
        @sub = @sorted[(($count - 1)/ 2) + 1 .. ($count - 1)];
        $third_quartile = get_median(\@sub, (($count - 1) / 2 - 1) / 2);
    }
    
    printf(qq[\nInput: \@array = (%s)\n], join(', ', @$test));
    say qq[Output: minimum: $sorted[0], first quartile: $first_quartile, median: $median, ] .
        qq[third quartile: $third_quartile, maximum: $sorted[-1]];  
}

sub get_median {
    
    my (@array, $position, $lower, $upper, $fraction);
    
    # returns the value at the given position
    # if position is non-integral returns the weighted intermediate value
    
    @array = @{$_[0]};
    $position = $_[1];
    
    # integral position
    return $array[$position] if $position == int($position);
    
    # find integral position below and above given position and 
    # calculate weighted intermediate value
    $lower = int($position);
    $upper = $lower + 1;
    $fraction = $position - $lower;
    return $array[$lower] * (1 - $fraction) + $array[$upper] * $fraction;
}   

Output



Input: @array = (0, 0, 1, 2, 63, 61, 27, 13)
Output: minimum: 0, first quartile: 0.5, median: 7.5, 
  third quartile: 44, maximum: 63

Input: @array = (8, 42, -3, 0, 99, 66, 21, 100)
Output: minimum: -3, first quartile: 4, median: 31.5, 
  third quartile: 82.5, maximum: 100

Input: @array = (1, 1, 1, 1, 1, 1, 1)
Output: minimum: 1, first quartile: 1, median: 1, 
  third quartile: 1, maximum: 1

Input: @array = (1, 2, 3, 4, 5, 6, 7, 8, 1000)
Output: minimum: 1, first quartile: 2.5, median: 5, 
  third quartile: 7.5, maximum: 1000

Input: @array = (31, 18, 73, 45, 75, 54, 62, 84, 53, 60,
41, 71, 40, 10, 87, 46, 14, 34, 29, 7, 62, 77, 48, 95, 22,
10, 5, 83, 50, 96, 4, 14, 75, 96, 75, 27, 63, 68, 49, 7,
95, 0, 0, 92, 32, 68, 40, 22, 17, 77, 43, 29, 58, 88, 96,
20, 50, 97, 3, 40, 84, 92, 87, 61, 9, 77, 5, 45, 100, 61,
76, 8, 95, 90, 94, 69, 5, 50, 32, 34, 25, 78, 21, 18, 46,
26, 59, 82, 14, 37, 57, 33, 42, 39, 32, 70, 89, 92, 71, 2)
Output: minimum: 0, first quartile: 25.5, median: 49.5, 
  third quartile: 76.5, maximum: 100