Parsing text with Erlang: pattern matching and guards

June 22, 2016

A simple example of parsing a time string and that illustrates some Erlang pattern matching and guard basics.

Spray painted letters MNOPQ on a city wall. class=
© 2016 Jon Tyson for Unsplash

Let’s begin at the end.


-module(m1).

-export([parse_time_part/1]).

% Given a string (e.g., "9:30 pm"), return the hour and minute as a list of
% two integers.  If no meridian given, assume pm.
parse_time_part(S) when is_list(S) ->
    {ok, HourMinute} = parse_time_parts(strip_tokens(S, ": ")),
    HourMinute.

parse_time_parts([H, M, Meridian])
    when Meridian =:= "AM";
         Meridian =:= "am";
         Meridian =:= "a.m.";
         Meridian =:= "A.M." ->
    {ok, [HI, MI]} = parse_time_parts([H, M]),
    {ok, [HI - 12, MI]};
parse_time_parts([H, M, Meridian])
    when Meridian =:= "PM";
         Meridian =:= "pm";
         Meridian =:= "p.m.";
         Meridian =:= "P.M." ->
    parse_time_parts([H, M]);
parse_time_parts([H, M]) ->
    {ok, [HI, 0]} = parse_time_parts([H]),
    {I, _} = string:to_integer(M),
    {ok, [HI, I]};
parse_time_parts([H]) ->
    {I, _} = string:to_integer(H),
    if
       I < 12 -> {ok, [I + 12, 0]};
       I >= 12 -> {ok, [I, 0]}
    end.

% Tokenize based on characters in the string Sep and then strip any
% trailing or leading spaces and commas from the tokens.
strip_tokens(S, Sep) ->
    [string:strip(string:strip(X, both), both, $,)
         || X <- string:tokens(S, Sep)].
         
-ifdef(TEST).
-include_lib("eunit/include/eunit.hrl").

strip_tokens_test() ->
    ?assertEqual(["Monday, April 11, 2016", "9:00 pm"],
                 (strip_tokens("Monday, April 11, 2016 ~ 9:00 pm", "~"))).

parse_time_part_test() ->
    ?assertEqual([19, 0], (parse_time_part("7"))),
    ?assertEqual([17, 0], (parse_time_part("5:00"))),
    ?assertEqual([21, 30], (parse_time_part("9:30"))),
    ?assertEqual([17, 30], (parse_time_part("5:30 PM"))),
    ?assertEqual([9, 30], (parse_time_part("9:30 AM"))).

-endif.

The tests pass:

$ erlc -DTEST m1.erl
$ erl -run m1 test -run init stop -noshell
  Test passed.
$

Function guards assert what arguments the function accepts.

parse_time_part(S) when is_list(S) ->
    {ok, HourMinute} = parse_time_parts(strip_tokens(S, ": ")),
    HourMinute.

The guard is when is_list(S). This is a type test that declares that the argument must be a list. (In Erlang, strings are lists of integers.) The guard is both a declaration and an assertion:

~$ erl
Erlang/OTP 18 [erts-7.3] [source-84db331] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> c(m1).
{ok,m1}
2> m1:parse_time_part(an_atom). 
** exception error: no function clause matching 
                    m1:parse_time_part(an_atom) (m1.erl, line 7)
3> q().
ok
5> ~$ 

Guards are limited to the following:

  • type tests
  • boolean operators
  • bitwise operators
  • arithmetic operators
  • relational operators
  • and a few built in functions (aka "BIFs")

In a guard, a semicolon means OR.

parse_time_parts([H, M, Meridian])
    when Meridian =:= "AM";
         Meridian =:= "am";
         Meridian =:= "a.m.";
         Meridian =:= "A.M." ->
    {ok, [HI, MI]} = parse_time_parts([H, M]),
    {ok, [HI - 12, MI]};

There are four values of Meridian this function will accept: AM, am, a.m. and A.M.

To AND together guards, separate them with a comma.

The order of functions matter.

parse_time_parts([H, M, Meridian])
    when Meridian =:= "AM";
         Meridian =:= "am";
         Meridian =:= "a.m.";
         Meridian =:= "A.M" ->
    {ok, [HI, MI]} = parse_time_parts([H, M]),
    {ok, [HI - 12, MI]};
parse_time_parts([H, M, Meridian])
    when Meridian =:= "PM";
         Meridian =:= "pm";
         Meridian =:= "p.m.";
         Meridian =:= "P.M" ->
    parse_time_parts([H, M]);
    

Erlang will try the AM version of the function before the PM version, simply because it occurs first in the source. It doesn’t matter in this code, but often this feature is useful.


If you see an error or something that could be improved, please let me know. This is a blog about me learning, so I expect I will get some stuff wrong. The best way to reach me is by email: mkbucc1234@gmail.com (after deleting all the numbers).

To make a comment, check for a thread on the erlang subreddit and if there isn't one, then start one up.

Follow on Twitter: @mbucc

Back to the index.