A New Take on Scheme Records, Inspired by C Structs

2024/12/24

Records in Scheme have a contentious history. Scheme didn’t have a standardized system for creating record types until 1999. Another, incompatible record system was standardized in 2007, which was very controversial1. R7RS decided to adopt SRFI-9 instead of R6RS’s system, and ever since people have been trying to combine the two in a way that makes everyone equally happy.2

Record types in Scheme serve two purposes:

  1. They make a type disjoint from all other types.
  2. They allow for values to be stored and referenced by name in an object efficiently.

All of the different proposals accomplish these goals3, and differ in how much control the programmer has in creating and dealing with records.

I’m not going to compare the multiple different proposals, mostly because I only use the standard R7RS system, and also because other people with more experience than me have argued about this for decades.

Instead, I want to discuss an alternative way of designing records that is inspired by C structs. This design would

  1. Allow records to be accessed by name (a symbol) so that records with a common field do not need dedicated procedures to extract the value stored in that field.
  2. Throw away the idea of inheritance entirely in favor of embedding. This allows for composite records like R6RS’s composite conditions. Acccessing a field in an embedded record does not require any more indirection than accessing a regular field. Passing an embedded record as a parameter can be as simple as pointer arithmetic.
  3. Give fine-grained control over opaque records. Accessing the fields of a record and embedding a record type into other record types requires a record type descriptor (rtd). Opaque records do not carry their record type descriptor, but can be manipulated if their rtd is supplied. This means opaque-ness can be controlled using scope. This also allows for implementing “sealed” records, although the concept is not needed because subtyping is not allowed.
  4. Support non-generative records. Two non-generative records are of the same type if
    1. They both have the same number of fields,
    2. The nth field in the first type has the same name as the nth field in the second type,
    3. If the nth field in the first type is an embedded record, the nth field in the second type is an embedded record with the same type.

The Problem of Field Names

Let’s write some code to handle an online shop. The two principal objects we deal with are customers and items. First we create our customer record:

(define-record-type <customer>
  (make-customer name current-cart order-history)
  customer?
  (name get-name)
  (cart get-current-cart set-current-cart!)
  (order-history get-order-history set-order-history!))

The getters and setters for each field of the record are pretty verbose. We run into an issue when we try to make our item record:

(define-record-type <item>
  (make-item name price quantity)
  item?
  (name get-name)
  (price get-price set-price!)
  (quantity get-quantity set-quantity!))

We now have a name clash between get-name for items and get-name for customers. The idiomatic way to fix this is to prefix the getters and setters with the name of the record. Let’s try again:

(define-record-type <customer>
  (make-customer name current-cart order-history)
  customer?
  (name customer-name)
  (cart customer-current-cart set-customer-current-cart!)
  (order-history customer-order-history set-customer-order-history!))

(define-record-type <item>
  (make-item name price quantity)
  item?
  (name item-name)
  (price item-price set-item-price!)
  (quantity item-quantity set-item-quantity!))

Now we have a mouthful on our hands (set-customer-order-history!). Of course we might be able to get away with shortening these names if we know there will never be a conflict, but the idiomatic way is the best way to ensure no conflict will ever occur.

The reason for this verbosity is because field getters and setters are all procedures. Using a procedure to abstract access is, of course, the Scheme Way. If I want to refactor the code so that the current cart is in an SQL database cached by redis valkey, then a change to customer-current-cart (and the <customer> record) is all that’s needed.

In most cases, however, these getters and setters will be for record types. A get in this case is a typecheck followed by a dereference. An optimizing Scheme compiler should be able to merge typechecks and inline dereferences, which it cannot do if the procedure can truly be anything.

I think that in the case of records, it is better to have special syntax for getting and setting fields, just like in other languages. For instance, to get the name field of a <customer> or <item>, you would write

(get customer name)
(get item name)

This would emphasize that the object is a record, and that the operation is almost side-effect-free (except for exceptions). If the type of the object is already known, then this reduces to a dereference. If the type is not known, then the operation turns into a dereference, a table lookup, and a dereference. Most records only have a few fields so the table lookup will consist of a few comparisons. All records have static fields so those comparisons can be compiled ahead of time, and records with a large number of fields can employ a static perfect hash function.

Inheritance and Compound Records: The Example of C Structs

With that construct in hand, let’s look to C structs. Structs are continguous blocks of memory declared ahead of time4:

struct customer {
    char *name;
    struct item **current_cart;
    struct order **order_history;
};

Let’s say I have a child account that has a parent account that can control it. The child account has a separate order history and cart. How would I make a struct that extends struct customer? In standard C I would put a struct (as opposed to a struct pointer) at the start of the struct:

struct child_customer {
    struct customer customer;
    struct customer *parent;
};

All struct child_customer * are now castable to struct customer *. In some dialects, like Plan 9 C, an anonymous tagged struct can be placed into a struct description, which causes all the fields of a named structure to be placed in the new struct:

struct child_customer {
    struct customer;
    struct customer *parent;
}

  1. “Over 30% of those who voted against ratification [of R6RS] mentioned the record systems as one of their reasons.” (SRFI-99) ↩︎

  2. SRFI 256, 240, 237, 222, 150, 136, 131, 99, 76, 57, and 9 all concern records. There is even arguments about Scheme records in mailing list archives prior to the SRFI system. ↩︎

  3. With the notable mention of SRFI-137 as a low-level type-creation and subtype-creation system. ↩︎

  4. current_cart and order_history are NULL terminated arrays of pointers for simplicity. In a real program I would use an array of structs with an explicit size_t length. ↩︎