Testing infrastructure

As I have mentioned before, I make amendments to the compiler ‘vertically’, not ‘horizontally’. By this I mean the following: I set a task - for instance, to compile the assignment of the form y(LONGINT) := x(INTEGER) - and then I amend all procedures in all modules of the compiler that relate to this task. And even within individual procedures (which are sometimes sizy) I only amend the branches that relate to my task (commenting the other branches out, or, better yet, protecting them with HALT(126) not yet implemented).

Doing it ‘horizontally’ would mean that I need to take, say, a procedure that is responsible for type conversions, and amending it all at once - that is, all possible type combinations. Since I’m a bear of little brain, and just learning the CP2 compiler, it would be quite a difficult and overwhelming job. Besides, it would be harder to test.

Besides, the vertical approach is a shorter path to intermediary results and making myself happy; it also allows to check hypothesis quickly (hypothesis like, Will such and such opcode do the job that I need done) and identify pitfalls early (for instance, turned out that opcode A2 MOV AX, mem cannot be used any more - or is inconvenient, and it had to be replaced.

But the vertical approach also has a price tag: it means I cannot “complete” a procedure: make all amendments it needs, debug it, test it out sufficiently and forget about it. As I choose new tasks, I go back to procedures I have amended before and amend them again - I unfold or uncomment branches, sometimes even change the branching logic. This implies the risk of breaking what had been done before.

I faced the same issue - and solved it efficiently - when I was doing the front-end of Herschel. And solved it with a testing infrastructure and a set of tests.

For each task that is set and solved I also form a standard - that is the result expected from the compiler for the test relating to the task. The task itself is also formulated as a module. This module is passed as input to the compiler, and then the output of the compiler is compared with the standard.

Over time the standard base grows.

After I get the compiler to produce the result for a new task - but before celebrating - I run the compiler over all the standard base. And if something got broken while I was working on the most recent task, the test run fails - in one or multiple tests. So, before I celebrate, I have to go back and fix up, until the test run succeeds on all the standard base.

And, of course, the whole test run over the standard base is automated, and the testing results are aggregated - all I have to do is click a commander button.

I will implement a similar testing infrastructure now for the Herschel backend - the code generator.