Code generation: first frontier
12 Aug 2020Here is a digest of things that have taken place in Herschel since last update:
In the front-end:
* Semantic graph of the subject module is externalized (know as symbol file :)
* Modules can now be imported; their semantich graphs (symbol tables) are read in and merged with the semantic graph of the subject module
In the back-end:
* Well, the backend modules of CP2 have been added into Herschel and renamed accordingly
* Just as with front-end, I have decided to preserve the modular and most of procedural decomposition of the code-generator (GC). This strategy has yielded quick results in the front-end, and I am adopting it in the back-end as well. The authors of OP2/CP2 are fantastic engineers, their design seems lean and decomposition decisions sane to me. Besides, one of our values is preserving maximal compatibility and transparency, so not reinventing the wheel wherever possible should serve that value.
* In order to ‘plunge’ into the CG (code generator) of CP2, I have set an almost educational goal: generate machine text for the simple program x := 1, assuming VAR x: LONGINT. This goal would require to make a ‘vertical slice’ of amendments: amend only those procedures (or their sections) that would be required to compile just that program.
* As of today, the educational goal has been reached. Here is the result:
As you can see, the target program has been somewhat extended; it assumes VAR x: INTEGER; y: LONGINT.
x := ... translates into one move, MOV imm32 -> mem32; because INTEGER is 32-bit, and the constant is encoded into the instruction. This is simple and no different from CP2.
y := FFEEDDCCBBAA9988 translates into two 32-bit moves MOV imm32 -> mem32. This is because the constant doesn’t fit into 32 bits, and thus cannot be encoded directly into the instruction - this is a limitation of amd64. The resulting code is similar to CP2’s output; however, it is produced with a modified flow of command, because the handling of LONGINTs changes in the instruction set and in the code generator.
y := 1 translates into one move, MOV imm32 -> mem64; because LONGINT is 64-bit, 1 is a constant that fits in 32 bits and thus can be encoded into the instruction. The processor performs sign-extension of the imminent constant before sending it to memory over the bus, thus all 64 bits of the target memory operand are properly initialized, despite that the imminent constant is shorter.