Tricky partitions
and easy stats
Weekly challenge 172 — 4 July 2022
Week 172: 4 Jul 2022
You are given an array of integers. Write a script to compute the five-number summary of the given set of integers. The five-number summary comprises the minimum, 1st quartile, median, 3rd quartile and maximum. More information is given in this Wikipedia page.
Example 1: (from the Wikipedia page) @array = (0, 0, 1, 2, 63, 61, 27, 13) Five number summary = 0, 0.5, 7.5, 44, 63 min q1 mdn q3 max
To calculate the 5-number summary we first sort the array. The minimum and maximum values are now the first and last elements. If there is an odd number of elements in the array, the middle one is the median; if there is an even number then the average of the two elements that straddle the middle is the median.
For the quartiles, I have used Method 1 from this page, which appears to be the method used on the page referenced in the task statement. It is that if the array has an even number of members, then the quartiles are the medians of the first and second halves of the data, or if it has an odd number, they are the medians of the two sub-arrays left after the middle member is deleted.
Of course there are modules to do all this, but it's only a dozen lines of simple coding to do it from first principles.
#!/usr/bin/perl # Peter Campbell Smith - 2022-07-07 # PWC 172 task 2 use v5.28; use strict; use warnings; use utf8; binmode(STDOUT, ':utf8'); my (@tests, $test, @random, @sorted, $count, $median, $first_quartile, $third_quartile, @sub); @tests = ([0, 0, 1, 2, 63, 61, 27, 13], [8, 42, -3, 0, 99, 66, 21, 100], [1, 1, 1, 1, 1, 1, 1], [1, 2, 3, 4, 5, 6, 7, 8, 1000]); push @random, int(rand(101)) for 0 .. 99; push @tests, \@random; # loop over tests for $test (@tests) { # sort numerically and count @sorted = sort {$a <=> $b} @$test; $count = scalar @sorted; # determine value at a position (which might not be integral) $median = get_median(\@sorted, ($count - 1) / 2); if (($count & 1) == 0) { @sub = @sorted[0 .. ($count / 2 - 1)]; $first_quartile = get_median(\@sub, ($count / 2 - 1) / 2); @sub = @sorted[($count / 2) .. ($count - 1)]; $third_quartile = get_median(\@sub, ($count / 2 - 1) / 2); } else { @sub = @sorted[0 .. (($count - 1)/ 2) - 1]; $first_quartile = get_median(\@sub, (($count - 1) / 2 - 1) / 2); @sub = @sorted[(($count - 1)/ 2) + 1 .. ($count - 1)]; $third_quartile = get_median(\@sub, (($count - 1) / 2 - 1) / 2); } printf(qq[\nInput: \@array = (%s)\n], join(', ', @$test)); say qq[Output: minimum: $sorted[0], first quartile: $first_quartile, median: $median, ] . qq[third quartile: $third_quartile, maximum: $sorted[-1]]; } sub get_median { my (@array, $position, $lower, $upper, $fraction); # returns the value at the given position # if position is non-integral returns the weighted intermediate value @array = @{$_[0]}; $position = $_[1]; # integral position return $array[$position] if $position == int($position); # find integral position below and above given position and # calculate weighted intermediate value $lower = int($position); $upper = $lower + 1; $fraction = $position - $lower; return $array[$lower] * (1 - $fraction) + $array[$upper] * $fraction; }
Input: @array = (0, 0, 1, 2, 63, 61, 27, 13) Output: minimum: 0, first quartile: 0.5, median: 7.5, third quartile: 44, maximum: 63 Input: @array = (8, 42, -3, 0, 99, 66, 21, 100) Output: minimum: -3, first quartile: 4, median: 31.5, third quartile: 82.5, maximum: 100 Input: @array = (1, 1, 1, 1, 1, 1, 1) Output: minimum: 1, first quartile: 1, median: 1, third quartile: 1, maximum: 1 Input: @array = (1, 2, 3, 4, 5, 6, 7, 8, 1000) Output: minimum: 1, first quartile: 2.5, median: 5, third quartile: 7.5, maximum: 1000 Input: @array = (31, 18, 73, 45, 75, 54, 62, 84, 53, 60, 41, 71, 40, 10, 87, 46, 14, 34, 29, 7, 62, 77, 48, 95, 22, 10, 5, 83, 50, 96, 4, 14, 75, 96, 75, 27, 63, 68, 49, 7, 95, 0, 0, 92, 32, 68, 40, 22, 17, 77, 43, 29, 58, 88, 96, 20, 50, 97, 3, 40, 84, 92, 87, 61, 9, 77, 5, 45, 100, 61, 76, 8, 95, 90, 94, 69, 5, 50, 32, 34, 25, 78, 21, 18, 46, 26, 59, 82, 14, 37, 57, 33, 42, 39, 32, 70, 89, 92, 71, 2) Output: minimum: 0, first quartile: 25.5, median: 49.5, third quartile: 76.5, maximum: 100
Any content of this website which has been created by Peter Campbell Smith is in the public domain