Peter
Peter Campbell Smith

SVG picture and least squares

Weekly challenge 165 — 16 May 2022

Week 165 - 16 May 2022

Task 2

Task — Line of best fit

When you have a scatter plot of points, a line of best fit is the line that best describes the relationship between the points, and is very useful in statistics. Otherwise known as linear regression, here is an example of what such a line might look like:

Graph

The method most often used is known as the least squares method, as it is straightforward and efficient, but you may use any method that generates the correct result.

Calculate the line of best fit for the following 48 points:

333,129 39,189 140,156 292,134 393,52, 160,166
362,122 13,193 341,104 320,113 109,177 203,152 343,100
225,110 23,186 282,102 284,98 205,133 297,114 292,126 
339,112 327,79 253,136 61,169 128,176 346,72 316,103 
124,162 65,181 159,137 212,116 337,86 215,136 153,137
390,104 100,180 76,188 77,181 69,195 92,186 275,96
250,147 34,174 213,134 186,129 189,154 361,82 363,89

Using your rudimentary graphing engine from Task 1, graph all points, as well as the line of best fit.

Analysis

We are asked to plot points and a least-squares best fit line. There are several ways to derive such a line, but I used the one most-commonly used experimentally, which is to assume that the errors - that is, the amount by which the points deviate from the line - are in y and not x. So if you draw a vertical line from each point to the least-squares line, we are looking for the line that minimises the sum of the squares of the lengths of these lines.

If the equation of that line is y = a + bx, it can be shown that:

     n * sum(x[i] * y[i]) - sum(x[i]) * sum(y[i])
b =  --------------------------------------------
           (sum(x[i]^2) - sum(x[i])^2) / n

a = (sum(y[i] - b * sum(x[i]) / n

where n is the number of points, and the points have coordinates x[i], y[i].

Again I chose to draw with y increasing in the upward direction and I have shown a rectangle which just encloses the supplied points - ie the bottom is not y = 0 and the left side is not x = 0:

Try it 

Try running the script with any input:



example:
23,186 282,102 284,98 205,133 297,114 292,126 339,112 327,79 253,136 61,169 128,176 346,72 316,103 124,162 65,181 159,137 212,116 337,86

Script


#!/usr/bin/perl

# Peter Campbell Smith - 2022-05-16
# PWC 165 task 2

use v5.26;
use strict;
use warnings;
use utf8;
use SVG;

my (@points, @points1, $n, $i, $x, $y, $sum_x, $sum_xy, $sum_x2, $sum_y, $a, $b,
    $svg, $min_x, $max_x, $min_y, $max_y, $grp, $padding, $width, $height,
    $grp2);

# given data
@points = (333, 129,  39, 189, 140, 156, 292, 134, 393,  52, 160, 166, 362, 122,  13, 193, 
           341, 104, 320, 113, 109, 177, 203, 152, 343, 100, 225, 110,  23, 186, 282, 102, 
           284,  98, 205, 133, 297, 114, 292, 126, 339, 112, 327,  79, 253, 136,  61, 169, 
           128, 176, 346,  72, 316, 103, 124, 162,  65, 181, 159, 137, 212, 116, 337,  86, 
           215, 136, 153, 137, 390, 104, 100, 180,  76, 188,  77, 181,  69, 195,  92, 186, 
           275,  96, 250, 147,  34, 174, 213, 134, 186, 129, 189, 154, 361,  82, 363,  89);
       
# calculate a and b
@points1 = @points;
$n = scalar(@points1) / 2;
$min_x = $min_y = 1000;
$max_x = $max_y = -1000;
for $i (1 .. $n) {
    $x = shift @points1;
    $y = shift @points1;
    $sum_x += $x;
    $sum_y += $y;
    $sum_xy += $x * $y;
    $sum_x2 += $x * $x;
    
    $min_x = $x if $x < $min_x;
    $max_x = $x if $x > $max_x;
    $min_y = $y if $y < $min_y;
    $max_y = $y if $y > $max_y;
}

$b = ($n * $sum_xy - $sum_x * $sum_y) / ($n * $sum_x2 - $sum_x * $sum_x);
$a = ($sum_y - $b * $sum_x) / $n;

# calculate size of box
$padding = 10;
$width = $max_x - $min_x + 2 * $padding;
$height = $max_y - $min_y + 2 * $padding;

# create box
$svg = SVG->new(width => $width, height => $height);

# draw least squares line - nb: SVG's y axis increases in value downwards so invert it
$grp = $svg->group(id => 'line_group',
    style => {stroke => 'blue', fill => 'blue'});
$grp->line(id => 'line', 
    x1 => $padding,                     y1 => $height - ($padding + $a + $b * $min_x) + $min_y, 
    x2 => ($max_x - $min_x + $padding), y2 => $height - ($padding + $a + $b * $max_x) + $min_y);

# draw the dots 
$grp2 = $svg->group(id => 'dots_group',
    style => {stroke => 'orange', fill => 'orange'});
@points1 = @points;
for $i (1 .. $n) {
    $x = shift @points1;
    $y = shift @points1;
    $grp2->circle(id => "dot$i", cx => ($x - $min_x + $padding), cy => ($height - ($padding + $y) + $min_y), r => 2);
}

# export the svg code
open OUT, '>pjcs_week165_task2.svg' or die $!;open OUT, '>utf8:', qq[/var/www/pwc/public_html/images/svg_165_2.svg] or die $!;
print OUT $svg->xmlify;
close OUT;
say qq[\nThe least squares line is y = $a + $b x\n];
        

Output


The least squares line is y = 200.132272535582 + -0.299956500261231 x week 165 task 2