Dates and parsing
Weekly challenge 259 — 4 March 2024
Week 259: 4 Mar 2024
You are given a line like below:
{% id field1="value1" field2="value2" field3=42 %}
Where
The line parser should return structure like below:
{
name => id,
fields => {
field1 => value1,
field2 => value2,
field3 => value3,
}
}
It should be able to parse the following edge cases too:
{% youtube title="Title \"quoted\" done" %}
{% youtube
title="Title with escaped backslash \\" %}
BONUS: Extend it to be able to handle multiline tags:
{% id field1="value1" ... %}
LINES
{% endid %}
You should expect the following structure from your line parser:
{
name => id,
fields => {
field1 => value1,
field2 => value2,
field3 => value3,
}
text => LINES
}
See above
Well, this is a little out of the ordinary. In real life I would be tempted to use a parser such as yacc, but in the interests of the challenge, here is my Perl solution.
The comments in the code more or less explain the logic. Firstly I extract the 'bonus' part for later consideration, secondly I hide any backslash-escaped characters as ¬nn¬ where nn is their decimal ordinal value, thirdly I convert any unquoted numeric fields like field=123 to field="123" to make the upcoming regular expression more manageable and fourthly I extract the name field.
That leaves me with the other fields, and I use a repeated regular expression to extract them one at a time, and then reverse the ¬nn¬ encoding. And lastly, if there is a 'bonus' text, I extract that. As I have extracted each item I've reformatted it in the requested output format, so all that's left is to output it.
There is a slight wrinkle to this that I don't think I've ever noticed in almost 30 years' use of Perl. It is
that Perl interprets '\\' as '\', even in a single-quoted string. Try executing say '\\';
and you'll
see what I mean: compare with say '\n';
, where Perl doesn't interpret the \n as a newline.
For that reason, you'll see that in my demo code I've had to represent "Title with escaped backslash \\"
as
"Title with escaped backslash \\\\"
in the function call.
#!/usr/bin/perl # Blog: http://ccgi.campbellsmiths.force9.co.uk/challenge use v5.26; # The Weekly Challenge - 2024-03-04 use utf8; # Week 259 - task 2 - Line parser use warnings; # Peter Campbell Smith binmode STDOUT, ':utf8'; line_parser('{% id field1="value1" field2="value2" field3=42 %}'); line_parser('% youtube title="Title \"quoted\" done" %}'); line_parser('{% youtube title="Title with escaped backslash \\\\" %}'); line_parser('{% id field1="value1" field2="value2" %} LINES {% endid %}'); sub line_parser { my ($input, $id, $output, $field, $value, $first, $rest); # initialise $input = shift; say qq[\nInput: ] . $input; # detach the 'bonus' part ($input, $rest) = ($1, $2) if $input =~ m|(.*?)\n(.*)|s; # encode \x characters as ¬nn¬ $input =~ s|\\(.)|'¬' . ord($1) . '¬'|ge; # change eg field=22 to field="22" $input =~ s|=(\d+)([ %])|="$1"$2|g; # extract id $input =~ m|(\w+)(.*)|; $id = $1; $input = $2; $output = qq[{\n name => $id,\n fields => {\n]; # extract fields while ($input =~ m|([\w\d]+)\s*=\s*"([\w\d¬ ]+)"|g) { $field = $1; $value = $2; # decode ¬nn¬ $value =~ s|¬(\d+)¬|chr($1)|ge; $output .= qq[ $field => $value,\n]; } $output .= qq[ }\n]; # extract bonus text if (defined $rest and $rest =~ m|(.*)\{% endid %\}|s) { $output .= qq[ text => $1]; } $output .= qq[}\n]; say qq[Output: $output]; }
Input: {% id field1="value1" field2="value2" field3=42 %} Output: { name => id, fields => { field1 => value1, field2 => value2, field3 => 42, } } Input: % youtube title="Title \"quoted\" done" %} Output: { name => youtube, fields => { title => Title "quoted" done, } } Input: {% youtube title="Title with escaped backslash \" %} Output: { name => youtube, fields => { title => Title with escaped backslash \, } } Input: {% id field1="value1" field2="value2" %} LINES {% endid %} Output: { name => id, fields => { field1 => value1, field2 => value2, } text => LINES }
Any content of this website which has been created by Peter Campbell Smith is in the public domain