The Many Nulls of Scheme

2024/10/16

NULL is a confusing concept. It can mean, among other things,

  1. A pointer to nothing (C, Java).
  2. Missing data (SQL).
  3. A special character marking the end of a list (C strings).

All of these nulls act differently. Sometimes NULL values are equal, sometimes they aren’t. Sometimes null and 0 and false are the same, sometimes they are not, sometimes they are kinda the same. In Javascript, there’s undefined and undefined == null but !(undefined === null).

So why do they share the same name in the first place? Generally speaking, null values represent the absence of a value that can be mixed in with other values. With this meaning of null, a language can have multiple null values. Javascript has undefined and null, for instance.

I’ve been learning Scheme lately. Scheme implementations happen to have 3 null values, two of which suck.

1. The Empty List

The empty list was the original null value in LISP. It is called NIL in Common Lisp, and the Scheme function to test if a value is the empty list is called NULL?.

The “empty list” is a list (in the sense of LIST?) but has many special properties:

  1. It is not a cons pair: every other list is.
  2. The empty list () is not valid syntax for a Scheme program, while every non-empty list is.
  3. Every non-empty list is a pair whose cdr is a list. The empty list is the end of a list.
  4. All empty lists are the same.

In this sense, the empty list is a null in the sense that the NUL terminator is a null.

2, kinda: “Unspecified value”(s)

Many procedures in Scheme return an “unspecified value”. What this means is that a procedure can return any value. There is no portable information about the “unspecified” value, except that it is a value. 1 A “portable” way to get an unspecified value is to do

(define x (if #f #f))

In Chicken, Chez, and MIT-Scheme, there is one “unspecified” value. There is no type predicate for it. Chicken and Chez have the procedure VOID that returns this value:

(eq? x (void)) ; => #t

In Chicken and Chez, it can be displayed, but not read:

;; Chicken
(display x) ; => #<unspecified>
#<unspecified> ; => error

;; Chez
(display x) ; => #<null>
#<null> ; => error

In MIT-Scheme, the reader supports #!UNSPECIFIC, which is the unspecified value. It does not have VOID:

(eq? (if #f #f) #!unspecific) ; => #t

Although all implementations that I know of have at most one unspecified value, there is no reason that there can be only one. Since using an unspecified value is almost always an error, an unspecified value could store a source location and a stack trace for debugging purposes. The unspecified value could also be #f or '().

In my opinion, unspecified values are a mistake. A procedure that returns nothing should return nothing. The following code should be an error:

(let ((x (display "hello, world")))
  (set! other-value x))

3. The EOF Object(s)

Functions that read from a port return an EOF object when there are no more objects to read. They can be created with the EOF-OBJECT procedure (R6RS) and tested with the EOF-OBJECT? predicate. The standard dictates

The precise set of end-of-file objects will vary among implementations, but in any case no end-of-file object will ever be an object that can be read in using read.

– R7RS

Although you might be able to print an eof object, there should be no way to read one.

Chicken, Chez, and MIT-Scheme all have the form #!EOF to represent an EOF object. This allows you to completely break EOF detection:

(define x (open-input-string "5 #!eof 6"))
(eof-object? (read x)) ; => #f
(eof-object? (read x)) ; => #t
(eof-object? (read x)) ; => #f
(eof-object? (read x)) ; => #t

In addition, typing #!eof into Chicken and Chez will quit the REPL. MIT-Scheme will just print the eof object. All three have only one EOF object.

The EOF object is another mistake. Better alternatives include

  1. invoking an exception on EOF
  2. emitting wrapped objects, like syntax objects
  3. Having a second eof test function that takes the port as an input, like feof in C

Since there can be multiple EOF objects (according to the standard), it’s possible to have a “real” EOF object that actually denotes the end of input, and fake EOF objects generated from input. (Then someone will want to store a “real” EOF object, meaning that there will have to be a “really real” EOF object, and so on…)

Honorable Mention: False

Once upon a time, Scheme’s “false” object was the empty list. The authors of the Scheme standards were not happy about this:

The empty list counts as false for historical reasons only, and programs should not rely on this because future versions of Scheme will probably do away with this nonsense.

R2RS, 1985

This was changed in R5RS. The empty list became a truth-y value, and #f became the only falsy value.

False isn’t really a null value. It doesn’t denote the absence of something, like the other things I mentioned. But it is a distinguished value that acts differently from any other value in conditionals:

(not #f) ; => #t
(not any-other-value) ; => #f

(if #f 'truthy 'falsy) ; => falsy
(if any-other-value 'truthy 'falsy) ; => 'truthy

So false can be used like a null value with read-write invariance:

(define falsy? not)

(import srfi-1)
(any falsy? '(1 2 3 4)) ; => #f
(any falsy? '(1 2 #f 4)) ; => #t

(define (truthy? x) (not (falsy? x))
(filter truthy? '(1 2 #f call/cc)) ; => (1 2 call/cc)

Since false is just “false” without any other interpretation (unlike the empty list), it can be used for null-like situations.

Should you? Maybe, depending on your application. The benefit is that conditionals will take the false branch with #f. This is Scheme, so you could also use a symbol:

(define (no-value? x) (eq? x 'null))
(import srfi-1)
(any no-value? '(1 2 3 4)) ; => #f
(any no-value? '(1 2 #f 4)) ; => #f
(any no-value? '(1 2 null 4)) ; => #t

(define (value? x) (not (no-value? x)))
(filter value? '(1 2 #f null call/cc)) ; => (1 2 #f call/cc)

  1. The R7RS-Small standardization group considered and rejected standardizing the behavior of undefined values (see #49). ↩︎