Memory model
How genericjs objects are represented
In the genericjs section, C++ objects are compiled to native JavaScript objects. This allows created objects to be garbage collected and be used to interface with external JavaScript code.
Here is a in-depth explanation of the genericjs memory model.
JavaScript Objects
A JavaScript object is a map from strings (property names) to references. Objects do not have a definite type and the set of properties is allowed to change over time, although this is highly discouraged for performance reasons.
Cheerp uses JavaScript objects to represent struct and class instances, or more technically any instance of an LLVM StructType. Cheerp creates the objects and set all the properties at creation time since C++ does not allow them to change anyway. This is also the best from a JavaScript engine point of view to generate efficient code.
Primitive values: Numbers
In JavaScript there is a single numerical type: Number
which is IEEE 64-bit floating point (aka double). Since a double has 52 bit of mantissa it’s possible to store all valid 32-bit integers into a double. On the other hand it’s not possible to represent the whole numeric range of 64-bit integers, which are supported by Cheerp using two 32-bit integers.
Numbers
instances are immutable, it’s never possible to change the internal value of a Number
, it’s only possible to overwrite the reference (e.g. var num
in the example above) with a new value of Number.
Primitive values: String
JavaScript Strings are not used directly by Cheerp, but they are accessible to the user and are necessary to pass data to many DOM/HTML APIs. JavaScript Strings are encoded using UTF-16 (2 bytes per character) and are immutable. It’s never possible to change a specific character of an existing string. ASCII strings used by C++ need to be slowly converted to String when necessary since they have a different memory layout (1 byte per character versus 2 bytes per character).
JavaScript Array and TypedArrays
JavaScript arrays are used for various purposes by Cheerp, mostly to represent C++ arrays. They are mutable, as it is possible to change the contained values. They are also dynamic since it’s possible to add and remove any element and even create holes. This behaviour is not internally used by Cheerp. Users have access to the Array
type and are free to use it.
Passing by reference
In JavaScript all parameters are passed by reference. So inside weirdFunc
it’s possible to change the content of the passed objects, like obj
. Also num
and str
are passed by reference, but since they are immutable they can be considered more similar to const references or even as copies. The main effect is that a called function has no way to change the value of the number in a manner which is visible by the caller. This means that we need to use a trick to support C++ pointers to integers and floating point values.
C++ pointers in Cheerp
C++ pointers need to support the following operations:
- Loading or storing a value from them
- Doing pointer arithmetic, which is fully defined by the standard only for C++ arrays
- Comparing for equality
- Ordering, which is fully defined for C++ arrays and left unspecified for unrelated objects
JavaScript references support equality comparison, but they have no ordering and do not support any arithmetic. Moreover, although Numbers are passed by reference as well, they are immutable and behave like being passed by copy.
The solution is to conceptually represent pointers as a pair of a container object and an offset into the object. If the container object is not an array an additional wrapper array must be created around the member. This solution provide:
- Pointer arithmetic, by operating on the offset field
- Pointer ordering in arrays, by comparing the offset field
- Read/Write access to numerical values, by accessing the container object at the right offset
This solution is also inefficient if used every time. Moreover wrapper arrays are required for pointer-to-members that escapes an object and top-level objects created on the stack or using new
.
It’s possible to remove this overhead in most cases by analyzing code at compile time. The idea is: if we observe that the pointer is not used for pointer arithmetic, but only used to load or store data or to access members, it’s possible to use it directly as a JavaScript object.
Objects kinds
Pointers which are not used for pointer arithmetics can be represented as pure JavaScript references and are as cheap as regular JavaScript objects. This style of handling pointers is marked as COMPLETE_OBJECT
in the Cheerp compiler.
If a pointer does not fall in the COMPLETE_OBJECT
category then is represented as a pair of an object and an offset. Those pointers are called REGULAR
. The Cheerp compiler tries very hard to avoid creating these object in two ways:
- If a given pointer can be proven to always have the same offset, only the container is stored
- In most cases instead of creating a JavaScript object with two members we can store the container and offset as separate variables.
Optimizations
What we want to do is to inspect code at compile time to be able to avoid using REGULAR
pointers as much as possible. The idea is that We can avoid creating a pointer temporary object if a pointer is:
- Not used for arithmetic
- Only directly loaded
- Only used to access members
This property needs to be computed for pointers being passed to every method to be able to propagate it correctly through the call graph. It is also necessary to take special care for virtual methods and indirect method calls. Still, since Cheerp works as an whole program optimizer and it has full knowledge about the call graph it is possible to use this optimizations in most contexts.