C++ calling convention

When interfacing with a C++ library the first thing you'll need to learn is the differences in calling convention. We'll take as an example a trivial C++ class and explore writing an FFI interface for it.

The full example can be found on GitHub in the example.cpp and example.ts files.

namespace lib {
    class Example {
        public:
        Example(int data);
        ~Example();
        void method() const;
        static Example create(int data);

        private:
        int data_;
    };
}

The first issue we'll run into is name mangling.

Name mangling

The method() of our Example class is not found in the resulting library with the name method or even lib__Example__method. The exact name depends on the compiler and/or target architecture, but on Linux the resulting name is probably going to be _ZNK3lib7Example6methodEv. The Internet holds many sources for C++ name mangling, but a basic "demangling" of this mangled name is:

_Z: All mangled names start with this prefix.
N: This is a nested name.
3lib, 7Example, 6method: The parts of the nested name, prefixed by their lengths.
E: This ends the nested name.
v: This is a function that takes no arguments (a void function).

For Deno FFI specifically, name mangling means that when defining the symbol to load you'll generally want to use the name property to define the mangled name:

const lib = Deno.dlopen(
  "./libexample.so",
  {
    lib__Example__method: {
      name: "_ZNK3lib7Example6methodEv",
      parameters: ["buffer"],
      result: "void",
    },
  },
);
lib.symbols.lib__Example__method(new Uint8Array());

Now you can call your method with a "plain text" accessor name while giving Deno the mangled name by which to find the symbol in the dynamic library.

Note that this method() is a method of our class, so even though it takes no parameters from a C++ point of view, it does have a single parameter in our FFI bindings which corresponds to the this argument for the call.

Finding mangled names

There are probably many good ways to find the mangled names of your C++ methods. The libclang library provides an API to get the different mangled names (there can be many in some cases) of a C++ method, so that could be used to automate finding the mangled names and mapping them to "plain text" accessor names.

A more manual way (on Linux) is to use the output of the nm command. Searching through the output is tedious but it's definitely possible to write even complex C++ FFI interfaces using this method. Just hope that the API doesn't change too often.

Constructors

Creating C++ objects requires first reserving memory for them, and then calling their appropriate constructor on said memory. Here is how we would construct an instance of our Example class:

const lib = Deno.dlopen(
  "./libexample.so",
    {
        lib__Example__Constructor {
            name: "_ZN3lib7ExampleC1Ei",
            parameters: ["buffer"],
            result: "void",
        },
    },
);
const example = new Uint8Array(4);
lib.symbols.lib__Example__Constructor(example, 313);

For this class we only need 4 bytes worth of memory, since the class only has the single int data inside it. This information is not directly available anywhere and often needs to be either calculated or figured out through trial and error. The size needed is determined by the C struct size calculations. This means that for instance a struct with a pointer and a boolean will be two pointers in size even though data wise it only requires 9 bytes (8 + 1).

Note also the C1 in our constructors' mangled name: C++ has three types of constructors (and destructors):

The complete object constructor. (C1)

This constructor creates the object itself, all data members, and all base classes.
The base object constructor. (C2)

This constructor creates the object itself, all data members and all non-virtual base classes.
The allocating constructor. (C0?)

This constructor does everything the complete object constructor does and allocates the memory for the object. It is not usually seen / used.

If a class has no virtual base classes, then the first two constructors are the same and will often end up being deduplicated from the library / binary, but the names remain, at least in GCC compiled libraries. As such, the nm output of the Example class will be something like this:

0000000000001110 T _ZN3lib7ExampleC1Ei
0000000000001110 T _ZN3lib7ExampleC2Ei
0000000000001120 T _ZN3lib7ExampleD1Ev
0000000000001120 T _ZN3lib7ExampleD2Ev

Note the four names, but only two distinct addresses. For Deno FFI specifically, if you're only creating objects using the library's C++ API then you should always be calling the complete object constructor (C1) [citation needed]. The base object constructor is only called from derived classes' constructors.

Creator functions: Passing C++ objects by-value

Sometimes classes might also have static creator methods, like in our case the create() method. Here C++ starts to show it's weird side. C++ is a curious language that has a special mention in at least the System V ABI.

If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer that has class INTEGER).

What this means is that any C++ class instance with a copy constructor and/or destructor in its interface is never passed-in or returned by-value in a register even if it could fit in one.

Instead for parameters the instance is passed in by reference (ie. as a "pointer" or "buffer" in Deno FFI terms). For return values an extra zero'th parameter (preceding even a possible this argument for non-static class methods) is added to the function which must be a pointer to a memory buffer to write the return value object into, and the function changes to return the pointer number to said memory buffer.

For the create() static method this means that our FFI interface needs to look as follows:

const lib = Deno.dlopen(
  "./libexample.so",
  {
    lib__Example__create: {
      name: "_ZN3lib7Example6createEi",
      parameters: ["buffer", "i32"],
      result: "pointer",
    },
  } as const,
);

const exampleBuffer = new Uint8Array(4);
const pointer = lib.symbols.lib__Example__create(exampleBuffer, 16);
// The returned pointer is the address of our passed-in buffer.
assertEquals(
  Deno.UnsafePointer.value(pointer),
  Deno.UnsafePointer.value(Deno.UnsafePointer.of(exampleBuffer)),
);

Note: If you do not care for the returned pointer address, it is safe to set the result as "void" to improve performance marginally.

Destructors

As we saw above with constructors, C++ also has multiple destructors per class.

Complete object destructor. (D1)

This destructor destroys the object itself, as well as data members and all base classes.
Base object destructor. (D2)

This destructor destroys the object itself, as well as data members and non-virtual base classes.
Deleting object destructor. (D0)

This destructor does everything the complete object destructor does and deallocates the object.

As with constructors, for a C++ class with no virtual base classes the first two are equivalent. D1 and D2 destructors do not call free() on the memory of the object, meaning that calling a D1 or D2 C++ destructor from Deno on a Uint8Array is safe: C++ will not try to deallocate the underlying ArrayBuffer's memory. As with constructors, you should always be calling the complete object destructor (D1) from Deno FFI.

If a class has a virtual destructor then things can get interesting. The reason for a virtual destructor to exist is the following: Imagine you have a C++ base class Base and inherited variant Derived. Now, imagine you get a pointer to an instance of the base class and want to deallocate said instance:

void removeInstance(Base* instance) {
    delete instance;
}

If the Base class does not have a virtual destructor, then this delete call will only release memory associated with the actual base class. If the instance here happens to be an instance of Derived, then any memory associated with the inherited variant class will be left allocated, causing a memory leak.

So, a virtual destructor is needed. With a virtual destructor, the call to the destructor is done through the instance's vtable, and the vtable will contain pointers to all of the classes' destructors. The deleting object destructor is called on delete instance whereas the complete object destructor is called on instance->~Base(). The base object destructor is only called from derived classes' destructor.

void objectDestructor(Base* instance) {
    instance->~Base();
}

void deletingObjectDestructor(Base* instance) {
    delete instance;
}

As the name implies, the deleting object destructor will actually call free() on the memory associated with instance. Thus, building an FFI interface to deletingObjectDestructor here and calling it with a Uint8Array is not safe and will almost certainly lead to the program crashing.