Beyond Pattern Matching: Multiple Return Values in Scheme

2024/10/20

This post describes some Scheme macros I have written for destructuring values. The first few sections describe the motivation for writing them. The third section describes why I designed the macros to be the way they are, and the fourth section is a practical example of their usage.

The core macros will work with R5RS. Some of the example code uses features from R6RS/R7RS and from SRFI-1.

All code in this article is licensed under the Apache 2.0 license.

Update 10/21/2024: Interesting article about type-checked expanded conditionals with binding and pattern matching in a variant of SML: Ultimate Conditional Syntax.

Why Pattern Matching?

When I started writing Scheme, one of the things I missed from Standard ML was pattern matching. For those not in the know, pattern matching destructures a variable and executes code based on which destructuring succeeds. It can make for very elegant code. Here is MAP as an example:

(define (map f l)
  (cond
    ((pair? l) (cons (f (car l)) (map f (cdr l))))
    ((null? l) '())
    (else (error "invalid list" l))))

fun map f (h::t) = (f h)::(map f t)
  | map f [] = []
  ;

Map with pattern matching is less noisy (helped by Standard ML’s cleaner syntax and strict type system) than the Scheme version.

Standard ML requires pattern matching: the core feature of the language is sum types where each case is labeled by a constructor, and the only way to handle these types is with case analysis:

(* type of lambda calculus expressions in Standard ML *)

type variable = string;

datatype expr
= LAMBDA of (variable * expr)
| APPLICATION of (expr * expr)
| VARIABLE of variable
;

(* Collect all free and bound variables in EXPR. *)
fun get_variables (free_vars, bound_vars) (LAMBDA (var, expr))
    = get_variables (free_vars, add var bound_vars) expr
  | get_variables (free_vars, bound_vars) (APPLICATION (left, right))
    = let val (free_vars_left, _)
              = get_variables (free_vars, bound_vars) left
      in
        get_variables (free_vars_left, bound_vars) right
      end
  | get_variables (free_vars, bound_vars) (VARIABLE x)
    = if in x free_vars then
        (free_vars, bound_vars)
      else
        (add x free_vars, bound_vars)
  ;

Why Not Pattern Matching?

Scheme is a dynamic language, not a static language like Standard ML. It doesn’t require pattern matching as a fundamental construct, and does not offer it in the standard. Evidently it is quite popular, as there are multiple SRFIs about pattern matching.

After looking through the options, I decided that I didn’t like any of them. Issues I saw were

  1. They were too complicated. For exaple, Alex Shinn’s matcher is hundreds of lines long. I like my Scheme simple to understand.
  2. The emphasis on structure is misplaced. Often times something with the same type (like a list representing a function application) will need a different branch depending on what is at the head of the list.
  3. Too static. I want to be able to add cases dynamically, and I want the cases to change what they destructure (and potentially bind) on the fly.

I realized that I didn’t want “pattern matching” after all. I wanted destructuring, which is closely related but not the same thing.

Destructuring in Standard Scheme and with SRFIs

Throughout this section I will use transformations of Scheme programs as an example of destructuring. This Scheme syntax consists of

EXPR ::= LAMBDA | IF | APP | ATOM
LAMBDA ::= (lambda FORMALS EXPR EXPR ...)
FORMALS ::= VARIABLE | (VARIABLE ...)
IF ::= (if EXPR EXPR EXPR)
LET ::= (let (BINDINGS) EXPR EXPR ...)
BINDINGS ::= ((VARIABLE EXPR) ...)
APP ::= (EXPR EXPR ...)
ATOM ::= VARIABLE | NUMBER

The transformation will turn LET bindings into LAMBDAs.

Standard Scheme has support for destructuring, but it isn’t that good. The COND macro has => clauses, which pass the returned value of the head clause when the head clause is not #F. Here’s an incomplete example:

(define (remove-let expr)
  (cond
    ((let-clause expr) => handle-let)
    ((if-clause expr) => handle-if)
    ((application expr) => handle-function)
    ((lambda expr) => handle-lambda)
    ((atom? expr) expr)
    (else (error "invalid expression" expr))))

The principal limitation of the COND clause is that each function on the right hand side of => takes exactly one argument. There is no way to actually destructure the argument.

The solution is to allow the heads of a COND clause to return multiple values. There is an SRFI that does just this: SRFI-61. A modified version of the above then becomes

(define (remove-let expr)
  (cond
    ((let-clause expr) proj0
     => (lambda (is-let? bindings . body)
          ...))
    ((if-clause expr) proj0 => handle-if)
    ((application expr) proj0 => handle-function)
    ((lambda expr) proj0 => handle-lambda)
    ((atom? expr) expr)
    (else (error "invalid expression" expr))))

There’s now a smaller issue, and that’s that the binding syntax is ugly. Each multiple value expression requires a guard clause, which in my case will be most branches. This also doesn’t work well with multiple sequential tests: there might be a way to do it with SRFI-210, but I haven’t looked into it.

For an ergonomic COND clause with multiple values, I want something that is

  1. Simple to understand.
  2. Easy to read.
  3. Makes the common case easy.

My Solution: COND-VALUES and AFTER

My solution is the AFTER macro and the concept of a “destructing function”.

A destructuring function is a function that returns no values on failure (using (values)) and one or more values on success. These functions use an empty return as “failure” instead of returning #f because if there is no returned value, there was nothing to destructure in the first place. The idea comes from MiniKanren.

Destructuring functions usually have => appended to them. For instance, the destructuring function for a pair is pair=>.

The AFTER macro makes writing destructuring functions easier. The average use of the AFTER macro looks like:

(after ((let (destructure=> x) => (y z))
        (when (predicate? y)))
 (do-something-with z))

The first element of the AFTER macro is a list of test clauses. The test clauses are executed sequentially. If one of them fails then the AFTER macro fails, and when they succeed the body of the AFTER macro is executed. The two types of test clause are:

I chose to add a keyword to each clause to visually break up the test clauses. Without them, the macro would look like

(after (((destructure=> x) => (y z))
        ((predicate? y)))
  (do-something-with z))

This is harder to read (for me), and it also makes it difficult to differentiate between a binding clause and predicate test clause.

Here is an example of destructuring a pair.

(define (pair=> x)
  (after ((when (pair? x)))
    (values (car x) (cdr x))))

If PAIR=> were passed any other type than a pair, it would return no values. Otherwise it will return the CAR and CDR of the pair.

So far I have only shown you destructuring functions. Most of the time you will write code to try multiple destructurings in a sequence and accept the first one that will work. The easiest way to do this is with the COND-VALUES macro. An invocation looks like

(cond-values
  values
  values
  (else ...))

Each element of COND-VALUES is evaluated in order. If one evaluates to any values, then that is returned as the result of COND-VALUES. Otherwise the next clause is executed. An ELSE clause is executed when no other clause succeeds. If there is no ELSE clause, COND-VALUES returns no values.

With COND-VALUES one can write more involved destructuring functions:

(define (length-at-least=> whole-list num)
  (when (<= num 0)
    (error "invalid number" 'length-at-least whole-list num))
  (let length-at-least=> ((lst whole-list)
                          (num num))
    (cond-values
      (after ((when (= num 0))
              (when (or (null? lst)
                        (pair? lst))))
        (apply values whole-list))
      (after ((when (> num 0))
              (when (pair? lst)))
        (length-at-least=> (cdr lst) (- num 1))))))

The above function destructures the input list X and checks that it is exactly LEN elements long. If it isn’t, then it returns no values.

Example: Scheme Transformations

None of the examples shown thus far have used the binding form of AFTER. Here I will do that by returning to the example of Scheme program transformations. First thing is the matcher:

(define (remove-let expr)
  (cond-values
    (after ((let (let=> expr) => (bindings body)))
      `((lambda ,(map car bindings) ,(un-body body)) ,(map cadr bindings)))
    (after ((let (if=> expr) => (test true false)))
      `(if ,(remove-let test) ,(remove-let true) ,(remove-let false)))
    (after ((let (lambda=> expr) => (formal body)))
      `(lambda ,formal ,(un-body body)))
    (after ((when (application? expr)))
      (map remove-let expr))
    (after ((when (atom? expr)))
      expr)
    (else (error "invalid expression" expr))))

(define (un-body exprs)
  (cond-values
    (after ((let (pair=> exprs) => (expr rest)))
      (if (null? rest)
          expr
          `((lambda <unused> ,(un-body rest)) ,(remove-let expr))))
    (else (error "invalid body" exprs))))

The REMOVE-LET function will try LET=>, IF=>, LAMBDA=>, etc. in order, and the first one that runs will cause the body to be returned.

Now the destructuring functions:

(define (let=> expr)
  (after ((let (length-at-least=> expr 3) => (head bindings . body))
          (when (eq? head 'let)))
    (values bindings body)))

This function mixes a LET clause with a WHEN clause. The WHEN clause is executed after the LET clause, and the bindings of the LET clause are in scope for the WHEN clause. If the WHEN clause fails after the LET clause suceeds, the whole AFTER macro fails and the function returns nothing.

You can see how the AFTER macro destructures the expression in a declarative way:

  1. The expression is a list with a length of at least 3,
  2. with its first element let.

The other destructuring functions are similar:

(define (if=> expr)
  (after ((let (length=> expr 4) => (head test true false))
          (when (eq? head 'if)))
    (values test true false)))

(define (lambda=> expr)
  (after ((let (length-at-least=> expr 3) => (head formal . body))
          (when (eq? head 'lambda)))
    (values formal body)))

(define application? pair?)
(define (atom? x) (or (symbol? x) (number? x)))

The transformer now works. Here are some examples:

(remove-let '(let ((x 5)) (f x))) ; => ((lambda (x) (f x)) ((5)))
(remove-let '(let ((x 5) (y 6)) (f x) (g y)))
;    => ((lambda (x y) ((lambda <unused> (g y)) (f x))) (5 6))

The matcher abstracts matching on the expression from REMOVE-LET. The destructurers could add extra checks (for instance, checking that the LET expression is a list of lists of two elements) without modifying the main loop at all. The matched expression doesn’t even have to be a list: the expression could be written in a vector or a record, and as long as LET=>, LAMBDA=>, etc. could match it and destructure it, then REMOVE-LET can handle it without modification.

Overall I think it is a more “Scheme” way to destructure values than using pattern matching. Instead of dealing with concrete terms, AFTER uses procedures, and instead of a complicated system figuring out which values to bind in a list, AFTER uses a more Lisp-looking block scope.