Modern C++ and Lisp Programming Style
Modern C++, as defined in the C++11 standard has evolved to become an algorithmic language. This represents as much an idiomatic departure from Object Orientation as C++ was a departure from C’s procedural top-down composition. “C++11 feels like a new language,” says Bjarne Stroustrup [Stroustrup]. The primary focus of modern C++ has become algorithms and generic programming. To this end, the language absorbs a number of idioms from other languages and paradigms, ranging from functional to meta programming concepts. Incidentally this is also the realm in which a much older language has done exceedingly well — since the 1950s: Lisp. Lisp is not only the first implementation of the Lambda Calculus but its entire structure is optimised to make generic programming easier. This paper will explore how the high level approach to developing programs in C++ is increasingly shaped by some of the same forces that dominate in developing programs using Lisp.
Given that our premise is that modern C++ has departed from an object-oriented, type based focus and refocused on algorithm and generic programming, I thought it appropriate to start with some quotes from one the key author of the STL, Alexander Stepanov. In an interview on stlport.org, Stepanov is attributed the following statements [Stepanov]:
“STL, in a sense, is an application of generic programming techniques that Dave Musser and I have been developing to a C machine model.“
The interesting point to note is that the basis of his design is C, not C++ and therefore not Object Orientation. Stepanov is indeed somewhat hostile towards the latter.
“Yes. STL is not object-oriented. I think that object orientedness is almost as much of a hoax as Artificial Intelligence.“
Stephanov’s position may be summarized as: instead of finding algorithms to embed in types, abstract the algorithms to operate across types.
“Yes. Always start with algorithms.“
Incidentally, algorithms are the idiomatic starting point when developing systems in Lisp. Scheme, a dialect of Lisp, in particular is said to be an “algorithmic language.” In fact Scheme’s standard documents are entitled “Revised Report on the Algorithmic Language Scheme.” [Scheme] It borrowed this idea from Algol standards documents.
“I suddenly realized that the ability to add numbers in parallel depends on the fact that addition is associative … In other words, I realized that a parallel reduction algorithm is associated with a semigroup structure type. That is the fundamental point: algorithms are defined on algebraic structures.“
Algebraic structure of types in particular is of course one of the tenets of functional programming languages, such as Haskell or the Hindler-Milner family of languages. Stepanov has thus set a course for C++ that departs from both object orientation and implicitly from imperative thinking. A certain degree of impedance mismatch between the various concepts and tenets that make up C++ should therefore come as no surprise.
Since Stepanov’s inspiration started with something as simple as addition, we shall follow suit. We will take a simple generic algorithm in both C++ and Lisp as a starting point for our discussion: add 1 to a number x. This is simpler than adding two numbers, but it will suffice for our discussion.
This reads as “define a function” (defun) named “xplusone” which takes one argument “x.” Its value is represented by (equals to) the expression “add 1 to x.” Firstly we note that the algorithmic structure follows the English language structure. This is owing to the use of prefix notation over infix notation. While in English we may talk about the sum of “a” and “b”, in arithmetic we would say “a sum b” or “a + b“. Most programming languages follow the latter, but it’s really a question of your point of reference. Lisp leans towards the meta language coined by the mathematician Haskell Curry: U-language [U-language], an abstraction representing communication from one person to another. This natural language heritage has made Lisp particularly suitable in natural language processing and symbolic computation and one might say that the root of algorithms is the manipulation of symbols [Algorithm]. Consequently, we might expect the synergy between Lisp and algorithmic programming to be a deep one. With that difference out-of-the-way, let us introduce the C++ equivalent, so we can proceed to more interesting discourse.
The above implements a functor whose operator() is implemented as a template function. Note how this differs from templating the entire struct and allows us to follow immediately with an instantiation of xplusone without providing the template argument. We can now call xplusone() as we would any ordinary function. The return type of the function is inferred as the result of the expression (x + 1 ). Two observations are striking: 1) The count of curly braces and parenthesis is greater than the Lisp version even though Lisp is known for its pervasive parenthesis. 2) The amount of syntax makes it all but impossible for the untrained eye to see the semantics. A detailed explanation of this C++ code snippet is beyond the scope of this discussion. Instead let us focus on how the Lisp version and C++ versions are used.
Here we define a variable x (defvar) to assume the value 2. We then call xplusone with the value x as argument. Unsurprisingly the value of this expression is 3.
Evaluation proceeds as follows. 2 is an integer. Therefore x is an integer. The sum of 1 ( an integer ) and x is also an integer. Therefore the resulting sum is an integer.
Note how the type has been inferred by way of the type resulting from applying the operator to the operands. Now let us examine the analog in C++.
2 is an integer. Therefore x is an integer. The decltype of ( 2 + 1 ) is also an integer. Therefore y is an integer.
Both C++ and Lisp have proceeded in much the same way. Both implementations represent a generic algorithm that works so long as + is defined for the type of the argument ‘x’ and 1.
Latent Type Errors Deferred to the User
We were fortunate in that there was no need to store state connected with the templated type inside our C++ struct. Had that been the case, we would have had to template the struct in its entirety. Had we, for example, discovered the need to store the value of our calculation inside our struct, our code might have changed thusly.
The implications are profound. The template<> expression has been lifted outside of the struct. We can no longer template just the operator(). This has yielded an incomplete type. xplusone can no longer be instantiated on its own — not without specializing the type. As a consequence the below code compiles perfectly happily.
With the instantiation of xplusone<int> commented out, the compiler has been deprived of the ability to infer if BLAH_BLAH_BLAH… is valid C++. The compiler simply didn’t expand that part of the code. As anyone who has done any complex meta programming in C++ can attest to, this dynamic can play itself out in the most obscure ways. It is often users of the abstractions created through template meta programs who find distant edge cases and are puzzled by compiler errors in subsystems not of their own making. The greater the abstractions, the more this applies. It is especially true when constructing domain specific languages using vehicles like BOOST Proto or working at a similar level of abstraction where incomplete types are layered to form abstractions. Yet even mainstream C++ programmers will attest to the obscurity of error messages deep from the bowels of the STL template library. This latent emergence of type system errors is something ordinarily characteristic of dynamic programming languages. Notable is the difference between run time error detection and compile time error detection. In any case, mechanisms are needed to address this latent emergence of type system errors. If shipping a library of code to a user, it is undesirable for that user to find errors in one’s own product, be they static or dynamic in nature.
We might expect that some measure of effort will be devoted to recover from latent emergence of type errors when developing template meta programs. A discussion of these follows below.
Type Safety Recovery Mechanisms
Let us return to our original examples. In both instances of our definition of xplusone, C++ and Lisp alike, type inference in the compiler has been invoked to complete our program. But the story does not end here. It does not end here, in part, because types have been *implied* through the operators. The loose coupling we have attained in C++ by decoupling ourselves from a polymorphic type hierarchy has been bought at a price: a measure of type safety has been lost. As we shall see, the way the C++ programmer copes with this is much the same as the way the Lisp programmer copes with it.
What do we mean by implied types and a measure of type safety being lost? Let us digress again. What if the function at hand did not operate on numbers to form a sum? What if instead it operated on stack types? Assume further that our C++ model defined a hierarchy of different Stack classes and that we have a function Foo(Stack& s) that takes stacks as arguments? The example is contrived, but contrived to illustrate a point. Assume further that Foo() calls the functions push() and pop() on its stack. In ordinary object orientation, polymorphism will guarantee that only Stack types are admissible, Stack itself or derived classes. After a push() we expect the stack size to have increased by one. After a pop() we expect the stack size to have decreased by one. Type safety ensures us we are operating only on stacks. Indeed in C++11 we may use the delete method mechanism to eliminate even implicit conversion of input arguments to functions. This assures us that nothing may be implicitly converted to a Stack that isn’t a stack. All is well in object-oriented land. The type system is working for us. But what of a templated function Foo() defining an algorithm over stack like types? What if the user provided a “button” type that also implemented “push” with an identical signature? Indeed the compiler would not complain, because the implied interface would also be satisfied. A measure of type safety has been lost that the compiler cannot compensate for. It will be up to the designer of the template function to recover this type safety manually — or live with the ensuing error conditions.
So what might typical mechanisms be to recover type safety? Principally there are four:
- Template specialization
- Type Traits
- Unit Tests
Let us examine each in turn.
We have already met template specialization. When we wrote…
…we completed the type definition of xplusone. You might say, we annotated the template with its type. xplusone is now fully specified. What we are saying here is: we know. We know that this algorithm will work with integers, much the same as declaring a function Foo() to take arguments of type Stack. Nothing is left for the compiler to infer. The problem of type has been solved at compile time. How then does Lisp compare here as a dynamically typed language? Can we similarly annotate xplusone in Lisp ? As it turns out, we can. In the spirit of Haskell Curry’s U-language, we might expect this to read something like the English sentence: “I DECLARE that the TYPE of the FUNCTION xplusone is integer -> integer.” Lisp uses the operators proclaim and declaim for the purpose.
So we have a function that takes an (integer) argument and is itself and integer. We are further saying this relation is a function type and xplusone satisfies is. One thing to note is the correspondence between the manner in which declaim in Lisp and decltype in C++ both declare the type of our expression external to the expression itself. In any event, we have specified the full signature of our algorithm. We may now ask Lisp to describe what it knows about our function.
But what of Lisp’s dynamic nature? Is this not merely an optimization hint? Let’s see. Instead of evaluating xplusone with a non integer type, let us write another function that would violate our type constraints and attempt to compile that.
We note that the type mismatch has been detected at compile type, not at run time. This is because the whole of the Lisp system is available both at run time and compile time, a key feature needed to support Lisp’s macro system. So in principle, features available at runtime can also be made available at compile time. It is up to the implementation how to handle type mismatch conditions. More importantly, it is up to the developer to elicit them.
Type traits are quintessentially predicate logic statements that say something about a type and therefore about instances of that type. For example, a container might require a random access iterator. A type trait might attest to the random access nature of an iterator type. Traits are themselves template types. Their construction proceeds like thusly:
- The unspecialized template returns the default, here false.
- The specialized template overrides that default.
Given the random access iterator example, we might construct a type trait which, unspecialized, returns false and then specialize the trait for random access iterator types to return true. Subsequently we can apply the predicate is_random_access_iterator to template arguments to check if the argument satisfies the random access property. It is clear how this mechanism is well suited to restore some of the type safety that has been lost through templates. Two things are worth noting: we must write these traits where there are not available to us already & we must employ them manually to enjoy their benefit. We are now spending our time recreating type safety that used to work by way of the compiler.
Lisp programmers will know where I am going with this and why I identified traits as being predicates. Lisp has a built-in “system” of predicates. Most are available for the programmer to use out-of-the-box and many more are generated behind the scenes as the programmer defines new types. As a matter of convention, predicates are functions ending in p. Lisp is known to be a symbol computing language, so chiefly concerns itself with symbols. Of interest, for example, might be if a symbol is bound to a definition. If the condition is “bound” our predicate might therefore be “boundp.” Let’s try this.
So in the first instance foo was not bound, so the predicate returned false (NIL). After binding foo to the value 1, foo is bound so “boundp”” returns true (T). Analogously, we can check if our function xplusone is bound. Yes. Maybe someone has already written xplustwo? No.
Simple arithmetic predicates are defined also.
User generated types are automatically equipped with suitable predicates. We create some types to demonstrate:
The macro system will have created common functions, such as constructors called “make-<type>”. We can use our predicates to check for these.
The interface we expected exists for both person and barbarian types. Let’s construct some people and apply the generated predicates to example expressions.
So guy is a person, as is Conan. “foo” is not a person. The onus is on the developer to apply these predicates where appropriate to facilitate type safety.
The onus is on the system to generate commonly used predicates. In practice predicate generation is attained by macros that abstract relevant types. In modern C++, the onus is on the developer to BOTH create type traits AND applythem to attain type safety.
Concepts might be thought of as static unit tests embedded inside a template function or template class (unit) which explore if the template argument satisfies the use cases of the template function or class. Concepts exploit the fact that in C++ compilation and linking are two entirely separate phases of constructing an executable. It is therefore possible to write code in a concept that exercises the interface of the template argument and emit meaningful error messages from the interface of a template class or function — rather than getting error messages deep from the inner workings of templated code. This essentially means we can provide meaningful interfaces once more. Because linkable object code does not reference these concepts, they are optimized out by the linker. But because concepts are only compiled, not executed, they are not suitable to verify dynamic properties of a type argument. What is unique about concepts is that they are embedded in the template code.
Lisp’s analog to concepts can be found in macros. See here for an elaboration. Likewise, unit tests may be used to verify the properties of a system. So let us explore unit testing.
In reverting to our stack example, and in restoring type safety to a function consuming a stack type parameter, what we are really interested in is not if push exists, or if pop exists. It is possible to conceive of a button type that implements both. What we are interested in is if the invariants of a stack hold for the type in question. A push increases element count by one, a pop decrements it by one, and for all popped elements these equate to the last elements pushed. If these properties hold, we need not care what the actual type is. “For all” -> “exhibit property” is a clue that what we really want is a system of universal quantification over some domain of discourse [universal]. We have once more arrived at predicate logic [predicate] and Lisp’s predicate system creates a natural basis for extending proof of correctness into predicate logic. Our approach therefore becomes:
1) Write a unit test in terms of predicate logic that shows required properties hold for type T
2) Statically bind, by way of ftype annotations etc., the function, whose correctness we are trying to prove, to the type T
The above combination in effect subsumes both traits and concepts as type safety recovery mechanism. Notably, we can now roll use cases for traits, concept & unit tests into one using just predicate logic.
This type of check is particularly well suited to the use case for exhaustive pattern matching and algebraic types. In essence, whether or not a particular function has exhaustive coverage for handling an algebraic type is easily explored with a universal quantification based predicate check. We may define generator functions to span the domain of algebraic types like thus:
Interestingly, the library used in the above examples hails from the world of Haskell, so it’s applicability to algebraic types ought to come as no surprise [quickcheck]. What is noteworthy is how Lisp has the ability to absorb distinct paradigms conservatively, by which I mean without forcing them onto the language or the user, but without producing an apparent disjoin. Lisp tends to produce a blend, rather than a disjoin.
Of course, the C++ programmer still has to write unit tests as neither traits nor concepts cover dynamic behaviour, but few designs manage to unify the three. I have seen but one such design during the course of my career.
Performance was not a focus of this discussion, but somehow it would just not be complete without it. Surely the effort spent by the C++ programmer must confer its benefits. The assumption is that the machine code of the C++ compiler is significantly faster than that emitted by a comparable dynamic language compiler. While this may hold true in general, it does not necessarily hold true with Lisp. Lisp is a programmable programming languages. If we are inclined to program it for speed, we can. Let us start with the disassembly of the xplusone() C++.
Even if you are not accustomed to reading assembly, it is fairly straightforward to interpret what is happening here. Firstly “push” sets up the call stack, register operations to move data into place, followed by a call to a native “add” instruction, and concluded by a pop unwinding the stack and finally a “retq” returning from the function call. Now let’s look at the equivalent in Lisp. Brace yourself for significantly more boiler plate.
The first “that’s NICE” moment I had when looking at Lisp disassembly was that I noted it was commented. That surely makes it easy to interpret what one is looking at even if one is not an assembler guru. I am not an assembler guru. Ok, observations: we are moving the address of a generic plus function into the R11 register. Then we call that, not a native CPU instruction. Presumably, the generic add function will know to call through to the same native integer arithmetic instruction for integers and defer to floating point arithmetic otherwise. Presumably, it will also have error checking to handle being passed a non number type. The error trap after the return statement implies as much. The C++ version had an advantage here, in order to compile and disassemble, I had to provide type information by invoking with an integer argument. The C++ variant of xplusone simply won’t compile without it. Let us level the playing field.
Unix C++ programmers will be familiar with the gcc optimization options -O1 through -O3. These correspond to optimization levels enabling progressively more optimizations. Lisp too has three such levels. However, it defines these across two dimensions: speed & type safety.
The former is clearly appropriate when using Lisp in a manner akin to Perl or Python, i.e. rapid prototyping & runtime type checks. The latter is appropriate for performance. What is more, we can vary these settings throughout what C++ programmers know as compilation units [attribute]. So rather than these options being passed as command line arguments to the compiler, they are associated with individual forms, at our discretion. Dough Hoyte, in his book “Let over Lambda” gives us some macros to assist with this [efficiency]. These are “slow-progn” & “fast-progn”. This enables us to build abstractions such as the one shown below:
This macro allows us to abstract away the boiler plate shown below which otherwise starts to look nearly as verbose as C++.
And arrive at something like this:
fast-progn may also be represented by the shorthand notation #f [efficiency].
However, in order to appreciate what type of assembly will be generated in calling a particular function, we also need to furnish the system with the function arguments. Looking at the disassembly of xplusone in isolation is not very representative. Being a dynamic compiler, Lisp will do its best to compile xplusone in absence of input type. In short, it will insist on GENERIC-+, at least if we are disassembling from within the REPL. What we really want is a disassembly construct that allows us to judge what machine code will be emitted given a certain input and given the raw expression under consideration. Paraphrased, if I give you and x of type fixnum (an integer) and evaluate the form (+ 1 x), in absence of all other boilerplate, what will you generate ?
Once more, Dough Hoyte provides a macro, simply called “dis”, in Let-Over-Lambda [efficiency]. The below illustrates:
Now we try to improve on this with our fast macro #f.
Voila. One add, two mov, clear carry, pop & return. This is as good as it gets in C. Aside from one type annotation and one optimization annotation, the expression itself has not changed.
Modern C++ has shifted focus from an emphasis on type ( objects ) which accommodate algorithm to an emphasis on algorithms parametrized over types. This mirrors the paradigm of other algorithmic languages, such as Lisp or Scheme. The vehicle C++ has chosen is templates or type parameterization, yielding a loose coupling of algorithm and type. In doing so, C++ is incurring some of the same problems revolving around latent type errors normally encountered in dynamically typed, algorithmic languages, and is adopting similar styled remedies. We call these remedies Type Safety Recovery Mechanisms.
As a general rule, the amount of boilerplate code the C++ programmer will generate to employ these mechanisms will by far outweigh the effort spent by his/her Lisp counterpart. More importantly, the mechanisms the C++ programmer will use are disjoint rather than building naturally on each other, which has a tendency to yield gaps in the design. Verifying that these gaps don’t exist should be the focus of any design effort in this context.
We have observed that the flexibility of loose coupling between algorithm and type in C++, while it confers benefits of a flexible design, also creates work surrounding the type system. Might this paradigm be appropriate in some situations then and less appropriate in others? Or should we use it everywhere and at all times? The answer is likely to revolve around the dynamics of the Expression Problem [expression] in that if :
a) Your type system evolves rapidly but your algorithms are largely constant, costs will outweight benefits
b) Your algorithms evolve rapidly but your type system is largely constant, your benefits will outweigh costs
In other words, the answer is not to be found in the system specification, but rather in the problem domain and successfully predicting if the type aspects or the algorithm aspects of the problem domain will change quicker and more substantially.
Knowledge of advanced Lisp will help you foresee where cracks in mainstream C++ designs will emerge before these cracks become apparent to C++ programmers.
[attribute] gcc allows function level optimization options by way of
[efficiency] http://letoverlambda.com/index.cl/toc “Macro Efficiency Topics”