Parsing text with Erlang pattern matching and guards

June 22, 2016

A simple example of parsing a time string and that illustrates some Erlang pattern matching and guard basics.

Spray painted letters MNOPQ on a city wall.
© 2016 Jon Tyson for Unsplash

Let’s begin at the end.


    
    -module(m1).
    
    -export([parse_time_part/1]).
    
    % Given a string (e.g., "9:30 pm"), return the hour and minute as a list of
    % two integers.  If no meridian given, assume pm.
    parse_time_part(S) when is_list(S) ->
        {ok, HourMinute} = parse_time_parts(strip_tokens(S, ": ")),
        HourMinute.
    
    parse_time_parts([H, M, Meridian])
        when Meridian =:= "AM";
             Meridian =:= "am";
             Meridian =:= "a.m.";
             Meridian =:= "A.M." ->
        {ok, [HI, MI]} = parse_time_parts([H, M]),
        {ok, [HI - 12, MI]};
    parse_time_parts([H, M, Meridian])
        when Meridian =:= "PM";
             Meridian =:= "pm";
             Meridian =:= "p.m.";
             Meridian =:= "P.M." ->
        parse_time_parts([H, M]);
    parse_time_parts([H, M]) ->
        {ok, [HI, 0]} = parse_time_parts([H]),
        {I, _} = string:to_integer(M),
        {ok, [HI, I]};
    parse_time_parts([H]) ->
        {I, _} = string:to_integer(H),
        if
           I < 12 -> {ok, [I + 12, 0]};
           I >= 12 -> {ok, [I, 0]}
        end.
    
    % Tokenize based on characters in the string Sep and then strip any
    % trailing or leading spaces and commas from the tokens.
    strip_tokens(S, Sep) ->
        [string:strip(string:strip(X, both), both, $,)
             || X <- string:tokens(S, Sep)].
             
    -ifdef(TEST).
    -include_lib("eunit/include/eunit.hrl").
    
    strip_tokens_test() ->
        ?assertEqual(["Monday, April 11, 2016", "9:00 pm"],
                     (strip_tokens("Monday, April 11, 2016 ~ 9:00 pm", "~"))).
    
    parse_time_part_test() ->
        ?assertEqual([19, 0], (parse_time_part("7"))),
        ?assertEqual([17, 0], (parse_time_part("5:00"))),
        ?assertEqual([21, 30], (parse_time_part("9:30"))),
        ?assertEqual([17, 30], (parse_time_part("5:30 PM"))),
        ?assertEqual([9, 30], (parse_time_part("9:30 AM"))).
    
    -endif.
    

The tests pass:


    $ erlc -DTEST m1.erl
    $ erl -run m1 test -run init stop -noshell
      Test passed.
    $
    

Function guards assert what arguments the function accepts.


    parse_time_part(S) when is_list(S) ->
        {ok, HourMinute} = parse_time_parts(strip_tokens(S, ": ")),
        HourMinute.
    

The guard is when is_list(S). This is a type test that declares that the argument must be a list. (In Erlang, strings are lists of integers.) The guard is both a declaration and an assertion:


    ~$ erl
    Erlang/OTP 18 [erts-7.3] [source-84db331] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
    
    Eshell V7.3  (abort with ^G)
    1> c(m1).
    {ok,m1}
    2> m1:parse_time_part(an_atom). 
    ** exception error: no function clause matching 
                        m1:parse_time_part(an_atom) (m1.erl, line 7)
    3> q().
    ok
    5> ~$ 
    

Guards are limited to the following:

  • type tests
  • boolean operators
  • bitwise operators
  • arithmetic operators
  • relational operators
  • and a few built in functions (aka "BIFs")

In a guard, a semicolon means OR.


    parse_time_parts([H, M, Meridian])
        when Meridian =:= "AM";
             Meridian =:= "am";
             Meridian =:= "a.m.";
             Meridian =:= "A.M." ->
        {ok, [HI, MI]} = parse_time_parts([H, M]),
        {ok, [HI - 12, MI]};
    

There are four values of Meridian this function will accept: AM, am, a.m. and A.M.

To AND together guards, separate them with a comma.

The order of functions matter.


    parse_time_parts([H, M, Meridian])
        when Meridian =:= "AM";
             Meridian =:= "am";
             Meridian =:= "a.m.";
             Meridian =:= "A.M" ->
        {ok, [HI, MI]} = parse_time_parts([H, M]),
        {ok, [HI - 12, MI]};
    parse_time_parts([H, M, Meridian])
        when Meridian =:= "PM";
             Meridian =:= "pm";
             Meridian =:= "p.m.";
             Meridian =:= "P.M" ->
        parse_time_parts([H, M]);
    

Erlang will try the AM version of the function before the PM version, simply because it occurs first in the source. It doesn’t matter in this code, but often this feature is useful.

Tags: erlang

Site generated by mkws and styled by Tufte CSS. Source at github.

© 2016 - 2022 Mark Bucciarelli