Implementing interpreters in Clojurescript

We will build an interpreter for toy languages (arithmetic expressions and assignments) that will allow us generate clojurescript functions from character string at run time. In order to decompose the interpreter building process as much as possible, we will decompose our task into a number of "languages" with increasing capabilities :

const : only one numerical constant (of integral type).
addsub : additions and subtraction.
addmult : additions, subtractions, multiplications and division, with priorities
arith : same as addmult, with parenthesis.
lang0 : any number of expressions that can be either arithmetic expressions for the arith language or assignment of the result of such an expression into a variable. Expressions can also use variables. The code returns the value of the last expression.
lang1 : same as lang0 with arguments written as %0, %1,…

The only required dependency is Instaparse. It works with both Clojure and Clojurescript, however, for the live coding with KLIPSE, I have to use a self-hosting fork (Instaparse-lumo).

Const language

If we only ever wanted to parse integers, of course we would just use js/parseInt on Clojurescript (Long.parseLong() on Clojure). However, as this is just a stepping stone to more interesting languages, we will use the same infrastructure of parsing and then interpreting.

Parser

When represented as a string of characters, integers are a string of one or more digits possibly prefixed by a minus sign for negative numbers. One could write the following parsing rule using only literal terminals :

number= '-'? ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9')+

Note that this will accept leading zeros while we could also want to forbid them. Parsing the string "-123" would give the following result :

[:number "-" "1" "2" "3"]

So it would be up to the AST processing step to collate those strings before turning them into a number. In order to speed both the parsing and the processing of our numbers, we can /tokenize/ them with a regular expression instead :

NUMBER= #'-?[0-9]+'

In Clojure[script], we can build the parser using the instaparse library :


(ns interpreters.core
(:require [instaparse.core :as insta]))

(def const-parser
  (insta/parser
    "PROG= NUMBER
    NUMBER= #'-?[0-9]+'"))

(const-parser "-1243")

But this minimal parser fails to parse strings that we would want to consider valid :


(const-parser "    -123456    ")

As we want to handle spaces before and after the number, we have to modify the grammar accordingly. For efficiency reasons, we also handle the plurality of consecutive spaces at the lexing stage with a regular expression :

Exercice : What if we remove the angle brackets in the grammar rules ?

Interpreter

The simplest interpreter would be something like :


(defn const-interpreter [ast]
  (instaparse.core/transform {:prog identity
                              :number (fn[s](js/parseInt s))}
                             ast))
(-> " -123" const-parser const-interpreter)

But we don't want to immediatly interprete the code, because we will want to be able to implement functions taking arguments, so we will actually return a Clojurescript function. As all the is language specific is the transform map, we will write once and for all (languages) the function taht takes a transform map and returns a function taking the ast and returning the function interpreting the ast :


(defn dynamic-eval [interpreter]
  (fn[ast]
    (fn[]
      (instaparse.core/transform interpreter ast))))

We can use this function for our const language to create a const-eval function that will turn an AST into a clojure function:


(def const-interpreting
  {:prog identity
   :number (fn[s](js/parseInt s))})
(def const-eval (dynamic-eval const-interpreting))

This function can be called on the result of parsing a string :


(def const-eval-test (-> "-123 " const-parser const-eval))

The result can then be called like any other clojurescript function :


(const-eval-test)

AddSub language

Parser

We define an add-sub rule with the foresight that it will be useful to represent operations of a given priority for the parsing step. However, we do not need such a node in our AST so we use angle brackets to remove them for the resulting AST :

The parser can be used on a sample string :


(addsub-parser " 1 +2-3-1")

Interpreter

We can define our interpreting transform map by adding the functions to process the new :add and :sub nodes in the AST :


(def addsub-interpreting
  (assoc const-interpreting :add + :sub -))

We can then reuse our previous helper function dynamic-eval:


(def addsub-eval (dynamic-eval addsub-interpreting))
(def addsub-eval-test (-> "1+2-3-1" addsub-parser addsub-eval))
(addsub-eval-test)

AddMult language

Parser

As we define a top down parser, we first try to match the lowest priority expressions :

We can use this parser on a sample program :


(addmult-parser "1 + 3 * -2 -1")

Interpreter

We can add the new AST node types to the transform map :


(def addmult-interpreting (assoc addsub-interpreting :mult * :div /))

We can then reuse our previous helper function dynamic-eval:


(def addmult-eval (dynamic-eval addmult-interpreting ))

The interpreter can be used as usual :


(def addmult-eval-test (-> "1 + 3 * -2 -1" addmult-parser addmult-eval))
(addmult-eval-test)

Arith language

Parser

We can update our grammar to allow grouping by matching parentheses. This is done by introducing a high priority factor that can be either a number or an expresssion (lowest level : add-sub) between parentheses :

We can use this grammar on a sample program :


(arith-parser "(1 + 3) * -2 -1")

Interpreter

The AST containts exactly the same kind of nodes as for the addmult language (neither the new ( and ) symbols nor the new term rule produce any node), so the interpreter transform map is exactly the same and we can reuse admult-eval:


(def arith-eval-test (-> "(1 + 3) * -2 -1" arith-parser addmult-eval))
(arith-eval-test)

Lang0

Parser

We update the parser to define :

a program as a sequence of expressions
an expression that can be either an assignment or an arithmetic expression
variables that can either be used on the left hand side of an assignment or as a read (varget) on the right hand side
factor that replaces term and also allows for varget or assignments (the value of the assignment expression is the assigned value)

The parser can be used on a sample program :


(lang0-parser "a=1+1*3;b=a-2; a+b;")

Interpreter

Now, our interpreter must be able to access an environment, so that assignments can have an effect. This environment is a map, mapping from variable names (as keywords) to values, with a special key :_ret for the current value of an expression. When evaluating the nodes of an AST with a transform map, each node will evaluate to the new environment. The first transform map just reduces the actual transform map over the environment :


(defn make-interpreting [make-instr-interpreting init-env]
  {:prog (fn [& instrs] (:_ret (reduce
                                       (fn[env instr]
                                         (instaparse.core/transform (make-instr-interpreting env) instr))
                                       init-env
                                       instrs)))})
(defn make-lang0-instr-interpreting [env]
  { :assig (fn[{varname :_ret :as env1} {value :_ret :as env2}]
             (assoc (merge env1 env2) varname value :_ret value))
   :add (fn[{v1 :_ret :as env1} {v2 :_ret :as env2}]
          (assoc (merge env1 env2) :_ret (+ v1 v2)))
   :sub (fn[{v1 :_ret :as env1} {v2 :_ret :as env2}]
          (assoc (merge env1 env2) :_ret (- v1 v2)))
   :mult (fn[{v1 :_ret :as env1} {v2 :_ret :as env2}]
           (assoc (merge env1 env2) :_ret (* v1 v2)))
   :div (fn[{v1 :_ret :as env1} {v2 :_ret :as env2}]
          (assoc (merge env1 env2) :_ret (quot v1 v2)))
   :number #(assoc env :_ret (js/parseInt %))
   :varname #(assoc env :_ret (keyword %))
   :varget (fn [{varname :_ret :as env1}]
             (assoc env1 :_ret (varname env1)))})

We use theses functions to build the interpreter with an initial environment :


(def lang0-interpret (dynamic-eval (make-interpreting make-lang0-instr-interpreting {:_ret 0})))

And test our interpreter as usual:


(def lang0-interpret-test (->> "a=1+(2-1)*3;b=a-2; a+b;" lang0-parser lang0-interpret))
(lang0-interpret-test)

Lang1

Parser

We just add arguments as a special kind of variables in the varget rule, their names being of the form "%0", "%1",…

We can test this parser on a sample program :


(lang1-parser "a=%0;a + %1 *3;")

Interpreter

Instead of starting with an empty environment, we fill the initial environment with arguments :


(defn args-to-env[args]
  (into {} (map-indexed #(vector (keyword (str "%" %1)) %2) args)))

(defn dynamic-eval-args [make-interpreter]
  (fn[ast]
    (fn[& args]
      (instaparse.core/transform (make-interpreting make-interpreter
                                          (assoc (args-to-env args)
                                                 :_ret 0))
                       ast))))

We can easily update the transform map from lang0 :


(defn make-lang1-instr-interpreting [env]
  (assoc (make-lang0-instr-interpreting env)
         :argument #(assoc env :_ret (keyword (str "%" %)))))

And define our interpreter in the usual way:


(def lang1-interpret (dynamic-eval-args make-lang1-instr-interpreting))

The interpreter can be called as usual, except that now (finally!) we can pass arguments to our function when we call it :


(def lang1-interpret-test (->> "a=%0;a + %1 *3;" lang1-parser lang1-interpret))
(lang1-interpret-test 2 3)