Rhine – A typed Elixir-inspired language on LLVM

Permalink Failed to load latest commit information.

rhine is designed to be a fast language utilizing the LLVM JIT featuring N-d tensors, first-class functions, and type inference; specifying argument types is enough. It has a full blown AST into which it embeds a UseDef graph.

rhine started off as rhine-ml, and rhine-ml was called rhine earlier.

Effort put into rhine-ml: 2 monthsEffort put into rhine: 1 year, 1 monthdef bar(arithFn ~Function(Int -> Int -> Int)) do println $ arithFn 2 4enddef addCandidate(A ~Int, B ~Int) do ret $ A + Benddef subCandidate(C ~Int, D ~Int) do ret $ C – Denddef main() do if false do bar addCandidate else bar subCandidate end A = {{2}, {3}} println A[1][0]end

rhine-ml, in contrast, has arrays, first-class functions, closures, variadic arguments, macros. It’s also much less buggy.

rhine uses a handwritten recursive-descent parser, which is faster and reports better errors, than the former Bison one. You will need to use a one-token lookahead atleast, if you want to keep the code simple. This gives you one level of:

parseSymbol(); // Oops, the lexed token indicates that we’re not in the right // functionparseInstruction(); // Ask it to use an existing token, not lex a new one

Another minor consideration is that newlines must be handled explicitly if you want to substitute ; with a newline in the language.

void Parser::getTok() { LastTok = CurTok; CurTok = Driver->Lexx->lex(&CurSema, &CurLoc); LastTokWasNewlineTerminated = false; while (CurTok == NEWLINE) { LastTokWasNewlineTerminated = true; CurTok = Driver->Lexx->lex(&CurSema, &CurLoc); }}

The AST is heavily inspired by LLVM IR, although it has some higher-level concepts like Tensor. It’s an SSA and has a UseDef graph embedded in it, making analysis and transformation easy.

The main classes are Type and Value. All types like IntType, FloatType inherit from Type, most of the others inherit from Value. A BasicBlock is a Value, and so is ConstantInt.

A BasicBlock is a vector of Instruction, and this is how the AST is an SSA: assignments are handled as a StoreInst; there is no real LHS, just RHS references.

StoreInst::StoreInst(Value *MallocedValue, Value *NewValue);

Value is uniquified using LLVM’s FoldingSet, and Use wraps it, so we can replace one Value with another.

/// A Use is basically a linked list of Value wrappersclass Use { Value *Val; Use *Prev; Use *Next; // Laid out in memory as [User] – [Use1] – [Use2]. Use2 has DistToUser 2 unsigned DistToUser;};

An Instruction is a User. User and its Use values are laid out sequentially in memory, so it’s possible to reach all the Use values from the User. It’s also possible to reach the User from any Use, using DistToUser.

class User : public Value {protected: unsigned NumOperands;};class Instruction : User;

The User has a custom new to allocate memory for the Use instances as well.

void *User::operator new(size_t Size, unsigned Us) { void *Storage = ::operator new (Us * sizeof(Use) + Size); auto Start = static_cast(Storage); auto End = Start + Us; for (unsigned Iter = 0; Iter < Us; Iter++) { new (Start + Iter) Use(Us – Iter); } auto Obj = reinterpret_cast(End); return Obj; }};

The Context is a somewhat large object that keeps the uniqified Type and Value instances. It also keeps track of Externals, the external C functions that are provided as part of a “standard library”. Unique llvm::Builder and llvm::Context objects, as well as the DiagnosticPrinter are exposed member variables. Finally, it is necessary for symbol resolution, and keeps the ResolutionMap.

src/Transform/Resolve is an example of something that utilizes the UseDef embedded in the AST.

creates one UnresolvedValue, A, an AddInst, and a MallocInst, which takes the string “B” and AddInst as operands.

The transform basically goes over all the Instruction in the BasicBlock, resolves UnresolvedValue instances, and sets the Use to the resolved value. It hence replaces the Value underneath the Use, and since the Instruction is referencing Use instances, there are no dangling references.

if (auto S = K->Map.get(V, Block)) { /// %S = 2; /// ^ /// Came from here (MallocInst, Argument, or Prototype) /// /// Foo(%S); /// ^ /// UnresolvedValue; replace with %Replacement if (auto M = dyn_cast(S)) { if (dyn_cast(U->getUser())) U.set(M); }}

Type Inference is too simple. One visit function is overloaded for all possible Value classes.

Type *TypeInfer::visit(MallocInst *V) { V->setType(visit(V->getVal())); assert(!V->isUnTyped() && “unable to type infer MallocInst”); return VoidType::get(K);}$ git submodule update –init$ mkdir llvm-build$ cd llvm-build$ cmake -GNinja ../rhine/llvm$ ninja$ mkdir ../rhine-build$ cd ../rhine-build$ cmake -GNinja ../rhine$ ninja

An inefficient untyped language is easy to implement. println taking 23 and “twenty three” as arguments is a simple matter of switching on type-when-unboxed. There’s no need to rewrite the value in IR, and certainly no need to come up with an overloading scheme.

Crystal made a good decision to start with Ruby. If your idea is to self-host, then the original language’s efficiency does not matter. All you need is good generated assembly (which LLVM makes easy).

View the original article here