Challenging Clojure’s Integration with Java in Lisp with C++
Preamble – An uncommonly common language
Lisp may be said to be simultaneously the most common and near enough most uncommon programming language in the world. We can quantify this. Head over to the Tiobe Index of Programming Languages at http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html .
As of August 2014, the lead of the pack is populated by the usual suspects… C, Java, Objective-C & C++
Lisp is barely number 19 on the top-20 index.
This of course proves nothing besides popularity. A single lens reflex camera, while infinitely more useful for photography, is still less common than cameras built into smart phones.
Lingua Franca of your Compiler
When I say Lisp is more common than any other programming language, I mean this: No matter if you are programming in Python or C++, Lisp is invariably what holds up the scaffolding behind the scenes. This is true because Lisp is the Lingua Franca of your compiler. Your compiler does not work with the syntax that you see, semicolons in C++ or significant white space in Python. Rather it discards this at the earliest possible moment and converts your code into an abstract syntax tree or AST. That AST is composed of lists of lists containing statements and expressions. Your compiler prefers this format for working with your code because it needs a format that is suitable for representing code as data, one of the core tenets of Lisp. It needs this because it must be able to both transform and optimise your code. For example, it may want to elide, re-arrange, or parallelize. Yet it must reason about the equivalence of the transformations it makes. In other words, your compiler needs to “calculates code,” precisely what Lisp is great at by way of its homoiconic syntax. So your compiler borrows this concept to build one or more Intermediate Representations (IR) using abstract syntax trees. Being a List Processor and working with Abstract Syntax Trees, your compiler may essentially be regarded as a Lisp engine. We can extract this intermediate representation from compilers like GCC and CLANG.
Let’s take the very trivial example of a C++ function square() being called from a function main.
CLANG generates this AST for the function square() using the command “clang++ -cc1 -ast-dump hellofun.cpp”
Hint: clicking on the images expands them to full resolution.
Immediately clear is the canonically correct Lisp indentation and code layout. This format is actually very illuminating because it makes it obvious, for example, where type casts have been inferred by the compiler. Similarly, we may elicit our AST from GCC as shown below. We use “g++ -fdump-rtl-dfinish hello fun.cpp“.
Returning to our analogy from photography, less capable cameras built into smart phones are more common than professional single lens reflex cameras. But for the average professional photographer this is of no consequence. A professional wedding photographer capturing life’s shiny moments in glamorous portraits will not find himself compelled to reach for a smart phone camera. But how would this be different if wedding photography took months and could avail itself of pre-existing work done by multitudes of smart phone users the world over ? Assume further that re-use depended on compatibility of the photographic material. And here lies the problem of programming languages like Haskell, OCaml & Lisp. They are extremely expressive. But they require certain a mathematical acumen, that eludes mainstream IT. Consequently the majority of problems solved in IT are expressed in less expressive languages. So while Haskell, OCaml & Lisp are more expressive, what is the use of being more expressive if you have to express most everything yourself? Being pragmatic means realising that re-using the wealth of very mature Java, C++ & Python libraries can be just as or more useful than writing such library support yourself in a more expressive language. Of course this consideration is subject to other factors, such as whether you require proof of correctness, or the ability to evolve third-party libraries yourself. Understanding your requirements will go a long way here.
Expressiveness vs Mainstream Re-Use Cost Benefit Analysis
What trend can be expected in the future? We have highlighted a convergence of C++ on Lisp as well as the emergence of functional programming in C++ in previous articles. For C++ this is new. In Java this trend is old and open by admission. Guy Steele, co-author of the Java language specification at Sun Microsystems is quoted as saying “We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.” The original quote can be read in context here http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg04045.html. Guy Steele is also known as the author of the Lisp dialect Scheme. Yet the irony is this: Each time another programming language adopts yet more features from Lisp — the same is true for functional programming and Haskell, OCaml etc. — this detracts from Lisp itself, or Haskell or OCaml. Why ? Because the cost benefit analysis tips in favour of the less expressive, more mainstream language. This is true because the mainstream language will always have superior library support, yet the list of benefits bestowed exclusively by features exclusive to Lisp ( or Haskell, OCaml, etc. ) has just diminished. So rather than being a case of advocacy, the trend becomes self-defeating. Programming languages are most often thought of as a man-machine interface. They are just as much a medium of communication between programmers and software engineers. It stands to reason then that the trend will always be towards mainstream. As university curricula target a wider audience, and IT becomes less a matter for computer scientists and mathematicians, so too mainstream IT will trend towards programming languages that require less mathematical acumen than is demanded by languages such as OCaml, Haskell or Lisp.
Rich Hickey understood this when he devised Clojure. Because there is only one thing more powerful than having extreme expressiveness OR mature library re-use at your fingertips: and that is to have them both. Yet again, there is precedent. Common Lisp has been embeddable in C for some time by way of ECL, Embedded Common Lisp. ECL has traditionally focused on embedding in C, but less well-known also works with C++, including the more recent C++11. Coincidentally, as C++ tends more towards modelling state in closures and functions, the lack of emphasis on object orientation in ECL will become less of an issue.
The recipe we will present here will support the features shown below:
1) The extreme expressiveness of Lisp embedded in C++, not just C
2) “Live programming” via a Python style REPL directly in a C++ process
3) Support for bidirectional calls from Lisp to C++ and C++ to Lisp
4) Variable support for interpreted, byte-compiled and natively compiled operation
5) The ability to re-use not only C++ libraries from Lisp but also re-use all of Lisp’s libraries
People have been writing the latter since around 1958. Why not use them?
6) A means of configuration management via Lisp to replace INI files or XML
If you are building ECL from source, make sure you are building with C++ support. See screenshot below for an example.
The Source Code
Show below is the C++ source code for our recipe. Assuming your ECL installation is to be found under the /usr/local prefix and further assuming you have saved the source code in the single file main.cpp, you might compile this example as follows :
g++ -std=c++11 main.cpp -I/usr/local/include -L/usr/local/lib -lecl -stdlib=libstdc++
Note that this example assumes OSX and g++ with an LLVM backend that requires the -stdlib=libstdc++ flag. On Linux, this would not be required. Please refer to comments in the code for explanations.
Putting It All Together
If you ran the above g++ command line, you will have a binary called a.out. Let’s start this.
So what happened ? We initialised our Lisp engine within C++, the performed any relevant initialization via initrc.lisp. This will prove incredibly useful later. We then evaluated a Lisp function (makeanumber) and streamed its output to cout. Noteworthy here is that C++11 was happy to infer the type from ECLs eco_to_float() function, eliminating any redundancy in type declarations. Incidentally (makeanumber) has been byte-compiled. Subsequently we entered our program’s main loop.
Now to make this slightly more interesting. We hit CTRL-C.
We now have a REPL inside our C++ process using Lisp’s excellent exception handling and restarts system. Restarts are one of the finer points of Lisp, one yet to find its way into C++. Having a REPL means we can go an poke around. Whatever we do here will be interpreted. One of the functions we defined was (runtime) it denotes our loop variable. Let’s try that.
Ok, so our C++ loop variable has the value 6. But really we can run anything that Lisp has scope to… arithmetic, anything. This is really useful, because we might, for example, interactively redefine a Lisp function subsequently called as part of the regular execution of our C++ program. This gives rise to an entire style of programming otherwise alien to C++: Live Programming.
In fact, lets do this right-now. The Lisp function (makeanumber) we called from C++ evaluates to the constant 3.2. We can verify this by re-evaluating it in the REPL. Let’s change it. We’ll redefine the function to return something else: 6.4.
There is nothing inherent about using constants here. This could be an arbitrarily complex operation. Indeed we might find other ways to inject the operation into our program, apart from hitting CTRL-C and getting a REPL. We might, for instance, inject this logic via something like Zero-MQ, a popular message bus technology that abstracts a range of architectural patterns. Of course, our C++ program, does not call (makeanumber) again, but if it did, you get the idea … immediate feedback without the edit-compile-debug cycle. Hence the name Live Programming.
Now let’s confuse our C++ runtime a bit. Say we want our loop variable to assume the value 60 instead and proceed from there. Remember those restarts? Exceptions such as CTRL-C are “restartable.” Just tell Lisp to (continue).
Iterations 7..59 were skipped and C++ continued with iteration 60.
Beyond Live-Programming and the REPL, it is easy to see how this paradigm might be extended to provide configuration management. If we can set application parameters and script this in a file without having to compile and link a new C++ binary, then we can provide a means of configuration management. But don’t we have XML for this today? We do. And a one-on-one comparison of XML vs Lisp based configuration could fill pages and start several flame wars. Yet this is not the goal. We do observe that we have included but one single header file, ecl.h, and in turn ended up with a REPL in C++, Live Programming and Configuration Management — all in one. A key aim of software engineering is to manage and reduce complexity. The astute reader will observe that all our boiler plate code so far fits in about 50 lines of code — excluding comments. A paradigm that solves a problem in 500 lines of code is the lesser of a paradigm that solves the same problem in 50 lines of code. A paradigm that solves 3 or more problems in 50 lines of code…
More than just a REPL, this so called BREAK-LOOP hides a full featured symbolic debugger.
Just to recap, so far we have seen C++ calling in-line Lisp; Lisp calling C++; a Lisp REPL inside of a C++ process; a full symbolic Lisp debugger inside of C++; byte compiled and interpreted mode of execution; as well as trivial Live-Programming. We are yet to see full integration with Lisp’s package management system and fully compiled Lisp code inside of C++.
For more information about package management, you might wish to read up on ASDF and Quicklisp. There are some 1000+ libraries available under Quicklisp. We will skip the detail, but think cmake-and-Python-PIP combined. Imagine I wanted to use sqlite – how would I make this available to my application ? Like so:
This achieves the equivalent of Python’s PIP install. How do we make this available within a Lisp application? We “require” it.
Pythonistas know this as “import.” But this is fully compiled code. No interpreter, no GIL ( Python global interpreter lock limiting concurrency ). Just the same convenience as Python.
The real question is: how do I make this available inside of C++ ? Well, essentially the same way we demonstrated above in the REPL. What works in the REPL, works the same if byte-compiled or fully compiled. When ECL starts, it loads a bootstrap file called .eclrc from the user’s home directory. My .eclrc file has three lines. The first two are:
The first imports ASDF, the second imports Quicklisp. An embedded ECL instance does not load .eclrc by default since there is an expectation that the application might be deployed outside of the context of the developer’s home directory. But our recipe already envisages its own bootstrap file called initrc.lisp — associated specifically with our C++ application. Loading sqlite support from within our embedded Lisp C++ application is thus essentially reduced to :
But we call this from the application initrc.lisp rather than the default bootstrap file.
This brings us to our final point: fully compiling our Lisp code for better performance inside of our C++ application. What we are after is the expressiveness of Python without its lacklustre performance. To make this a little more interesting, we will inline C++ directly inside Lisp. Matthias Benkard’s journal has a great post on how to inline C++ in ECL. The (c-inline) macro can be persuaded to inline C++ as well as C. What is not immediately obvious from the posting is that the code presented is not immediately usable. Rather inlining C++ presumes static compilation. Matthias gives the following example:
To use Matthias’ code we must first compile it — as we might rather expect with any C++ source.
We do this simply via (compile-file) and (load) directly from within Lisp. Now executing the function (c++hex) works as expected.
Here again, if we want to avail ourselves of this technique in our C++ recipe, we require but one small modification to our initialize() function – two lines of code. We replace (load) with (compile-file) and a subsequent (load) with the latter eliding the file name extension.
This produces an interesting JIT style behaviour when running our C++ application. We can even observe the system CLANG compiler doing it’s magic because CLANG warning are finding their way to standard out.
To conclude, we have changed but one line of Lisp code and added another to our C++ recipe and have in effect added both Lisp JIT and C++ JITcapabilities — prototyped and demonstrated working all in under one minute of coding. Solving complexity with the smallest number of moving parts: this is what software engineering is all about!
Like Clojure to Java, ECL can be used to host Lisp within C++. Head over to Meta-Circular Adventures in Functional Abstraction on how to leverage this capability for full featured functional programming. One key difference with ECL and C++ is that we have simple yet effective control over the AOT of our JIT process. We may chose to interpret, byte-compile or fully AOT/JIT compile at a point of our choosing.