NULL is a confusing concept. It can mean, among other things,
- A pointer to nothing (C, Java).
- Missing data (SQL).
- A special character marking the end of a list (C strings).
All of these nulls act differently. Sometimes NULL values are equal,
sometimes they aren’t. Sometimes null and 0 and false
are the same,
sometimes they are not, sometimes they are kinda the same.
In Javascript, there’s undefined
and undefined == null
but
!(undefined === null)
.
So why do they share the same name in the first place? Generally speaking,
null values represent the absence of a value that can be mixed in with
other values. With this meaning of null, a language can have multiple
null values. Javascript has undefined
and null
, for instance.
I’ve been learning Scheme lately. Scheme implementations happen to have 3 null values, two of which suck.
1. The Empty List
The empty list was the original null value in LISP. It is called NIL
in Common Lisp, and the Scheme function to test if a value is the empty
list is called NULL?
.
The “empty list” is a list (in the sense of LIST?
) but has many special
properties:
- It is not a cons pair: every other list is.
- The empty list
()
is not valid syntax for a Scheme program, while every non-empty list is. - Every non-empty list is a pair whose cdr is a list. The empty list is the end of a list.
- All empty lists are the same.
In this sense, the empty list is a null in the sense that the NUL terminator is a null.
2, kinda: “Unspecified value”(s)
Many procedures in Scheme return an “unspecified value”. What this means is that a procedure can return any value. There is no portable information about the “unspecified” value, except that it is a value. 1 A “portable” way to get an unspecified value is to do
(define x (if #f #f))
In Chicken, Chez,
and MIT-Scheme, there is one
“unspecified” value. There is no type predicate for it.
Chicken and Chez have the procedure VOID
that returns this value:
(eq? x (void)) ; => #t
In Chicken and Chez, it can be displayed, but not read:
;; Chicken
(display x) ; => #<unspecified>
#<unspecified> ; => error
;; Chez
(display x) ; => #<null>
#<null> ; => error
In MIT-Scheme, the reader supports #!UNSPECIFIC
, which is the
unspecified value. It does not have VOID
:
(eq? (if #f #f) #!unspecific) ; => #t
Although all implementations that I know of have at most one unspecified
value, there is no reason that there can be only one. Since using an
unspecified value is almost always an error, an unspecified value could
store a source location and a stack trace for debugging purposes. The
unspecified value could also be #f
or '()
.
In my opinion, unspecified values are a mistake. A procedure that returns nothing should return nothing. The following code should be an error:
(let ((x (display "hello, world")))
(set! other-value x))
3. The EOF Object(s)
Functions that read from a port return an EOF object when there are no
more objects to read. They can be created with the EOF-OBJECT
procedure
(R6RS) and tested with the EOF-OBJECT?
predicate. The standard dictates
The precise set of end-of-file objects will vary among implementations, but in any case no end-of-file object will ever be an object that can be read in using
read
.– R7RS
Although you might be able to print an eof object, there should be no way to read one.
Chicken, Chez, and MIT-Scheme all have the form #!EOF
to represent an
EOF object. This allows you to completely break EOF detection:
(define x (open-input-string "5 #!eof 6"))
(eof-object? (read x)) ; => #f
(eof-object? (read x)) ; => #t
(eof-object? (read x)) ; => #f
(eof-object? (read x)) ; => #t
In addition, typing #!eof
into Chicken and Chez will quit the REPL.
MIT-Scheme will just print the eof object. All three have only one EOF
object.
The EOF object is another mistake. Better alternatives include
- invoking an exception on EOF
- emitting wrapped objects, like syntax objects
- Having a second eof test function that takes the port as an input,
like
feof
in C
Since there can be multiple EOF objects (according to the standard), it’s possible to have a “real” EOF object that actually denotes the end of input, and fake EOF objects generated from input. (Then someone will want to store a “real” EOF object, meaning that there will have to be a “really real” EOF object, and so on…)
Honorable Mention: False
Once upon a time, Scheme’s “false” object was the empty list. The authors of the Scheme standards were not happy about this:
The empty list counts as false for historical reasons only, and programs should not rely on this because future versions of Scheme will probably do away with this nonsense.
– R2RS, 1985
This was changed in R5RS. The empty list became a truth-y value, and
#f
became the only falsy value.
False isn’t really a null value. It doesn’t denote the absence of something, like the other things I mentioned. But it is a distinguished value that acts differently from any other value in conditionals:
(not #f) ; => #t
(not any-other-value) ; => #f
(if #f 'truthy 'falsy) ; => falsy
(if any-other-value 'truthy 'falsy) ; => 'truthy
So false can be used like a null value with read-write invariance:
(define falsy? not)
(import srfi-1)
(any falsy? '(1 2 3 4)) ; => #f
(any falsy? '(1 2 #f 4)) ; => #t
(define (truthy? x) (not (falsy? x))
(filter truthy? '(1 2 #f call/cc)) ; => (1 2 call/cc)
Since false is just “false” without any other interpretation (unlike the empty list), it can be used for null-like situations.
Should you? Maybe, depending on your application. The benefit is that
conditionals will take the false branch with #f
. This is Scheme, so
you could also use a symbol:
(define (no-value? x) (eq? x 'null))
(import srfi-1)
(any no-value? '(1 2 3 4)) ; => #f
(any no-value? '(1 2 #f 4)) ; => #f
(any no-value? '(1 2 null 4)) ; => #t
(define (value? x) (not (no-value? x)))
(filter value? '(1 2 #f null call/cc)) ; => (1 2 #f call/cc)
-
The R7RS-Small standardization group considered and rejected standardizing the behavior of undefined values (see #49). ↩︎