Interpreted Glyphs: Really Simple

So I find out my assessment, potentially, on Wednsday and see what path the remainder of my life will be like. It's a bit unsettling to think that - finally - I may have ended it. Its nice to think I can reapply but if I'm working fulltime or 'full'time, then I won't really have enough to spare. Oh well.

I had thought of not reporting about my progress on Simpletron and the Simple compiler. That's silly, of course, because I've linked to my github account where you can see all the commits and delays and current state. I actually gain by explaining what's going on (outside of the commits no one else will read).

Anyway, I was thinking on the nanopass compiler article and find it appealing for changing the Simple Compiler. Rather than cart around useless rem commands, I can just delete them. Unless, of course, a goto refers to it. And that goes for all the other line numbers too. But this is forward, I'll be more explicit for those who haven't read the article.

Very quickly, in case you don't recall or know that I've been working on Deitel's Simpletron/Simple Compiler and what those are: Deitel is a publisher with some computer texts. There are a few exercises that involve simulating a cpu and then a compiler with a subset of the computer language BASIC. I've completed the canonical implementation of the exercises (albiet in Python rather than Java or one of the C's). Working on the extension problems currently.

These, potentially, involve changing the structure of the compiler quite a bit so I've been dragging my feet. A tiny part of that is anticipating breaking the compiler, but barely. Reverting is effortless and there isn't any real danger anyway since these thoughts are clear. The more ambiguous aspect comes from when to split wildly divergent extensions and when to freeze a version to document. (Not necessarily the same point.)

One extension involves changing the compiler to an interpreter. Rather than transform 'Simple' code to simpletron instructions, the interpreter puts the instructions into memory. Then it runs just that command (or block if multiple) This restricts, for example, goto commands to pointing at already declared instructions.

Another extension (my own), involves introducing an assembler in between the compiler and simpletron. It isn't necessary since the language is barely above the assembler level anyway. However, introducing the interface lets me fantasize about either retargeting the compiler to another computer simulator. (Say, Notch's Dcpu10) Or, I could write a compiler for another simple language (a pascal? a scheme even?) to target the simpletron.

But, those are changes I will pursue later. For the moment, I am more concerned about incorporating the eerie basic subroutine. Normally, it would have a name, associated commands, and be callable from anywhere. Because 'simple' requires line numbers, a client can call any part as a subroutine. I can already goto any line (and depend on another pass to resolve lines I haven't enountered yet). But, a gosub must return to the point after the gosub. Consider

print 4
return
gosub 1
gosub 2

One of the problems of writing programs at this level is unstructured programming techniques like this. The print is fine, but then I move to the return. If the return were below the gosub, then - upon reaching the gosub - I insert a goto in the 'return's spot that points at the instruction right after the gosub. Here, I hit it early, so what is supposed to be there is ambiguous. I hadn't thought deeply and appear not to be able as I'm about to argue myself into a hole.

Suffice that it is late and I am tired. Returns need to initialize with a goto [next instruction] in case of this situation. But then what about after I've returned? Do I reset it to next instruction or leave it pointing at the first gosub? If I'm to protect the client, then the former is arguable. But, basic is so far down, that I cant' really justify it. If you want to make an infinite loop, who am I to stop you? If you want spagetti code, I have no right to stop you. So why would I make subroutines better than the spec calls for?

Mainly, because I know that it can be better and implementing a hexagonal wheel because it matches my presumed experience feels less useful than implementing the octagonal wheel I learned about earlier. In this case, that means named functions/regions of code. Those suggest some scope and I even heard how compilers (generally) graduate to that level: an abstract syntax tree for each function.

This is why I want an intermediate layer I can manipulate, unacknowleged by the spec. If I examine the code and find forward or backward references (a certainty), then I can resolve the reference before producing code. Consider a for loop

for x = 3 to 5
print x
next

Without making an intermediate change to the simple code, I have to save the information about the for somewhere to use when I hit the next command that loops back to the for and increments x.

However, there is another paradigm. I encountered it in the Nanopass compiler article. Rather than make ad hoc changes to code in between compiler passes, the professor who wrote it demanded his students provide a grammer for the intermediate language that a particular pass expects and emits. So, in the above case, I would append the line number of the for to the next and the next to the for. The gosubs recieve the line number where the return's goto will be stored. For clarity:

for x = 3 to 5 3
print x
next 1

print 4
return
gosub 1 2
gosub 2 2

This requires, though, that I have the whole source code available to me, as a list of strings. Otherwise, the separate passes would be opening the same file multiple times and resolving their adjustments inefficiently. Were I to do so though, then I can make a number of nice transformations. Like, after all the forward references are taken care of, I can cut all the line numbers and comments that aren't referenced. (Or even increase the line number past a comment and then cut them all.)

While I am quite excited to make that change, it flies in the face of both the interpreter version (which can't import the whole file) and documenting the code. The documentation, to my mind, should reflect Deitel's description of how to make the simple compiler. As I improve the compiler with functions or strings, some basic elements move farther from the original working version.

On one hand, it is unlikely that anyone will look at my implementation, much less the wiki documentation for it. And yet, I've read a fair amount of the only implementation uploaded before mine (that I could find) of both the simpletron and simple compiler. (Just the simpletron is more common.) If someone were to reference mine as a guide, then a version with multiple passes that defy Deitel's spec (to satisfy one or more of the embellishments) would likely disincline that person to continue. S/he either has to investigate where the old pass remains in the new version, construct the undocumented, canonical version from an old commit, or ignore Deitel and work from my template. Obviously, none of these are likely. So, I would have just failed that person as guide.

On the other hand, If I were to regard the literateprogramming wiki as my primary audience, I could justify completing the exercise and then writing the documentation. Of course, they could also see that I had gone beyond the canonical spec and mutter. They may not be invested in adherence, but if someone were to submit another implementation in a different language but only satisfy the canonical spec, there would be a significant disparity.

In the end, I see that I should redact my for loop changes and refactor scompiler and simpletron to their very best selves for documentation. I'll need to move the 1.0 tag up so the wiki references the right version. Only then can I branch out with extensions and interpreter. Honestly, I don't need the extensions for the interpreter so I won't feel bad about not rejiggering them for the interpreter.

I'm also working on some java projects and learning clojure, but that's another night's work.

Interpreted Glyphs

Wednesday, September 5, 2012

Really Simple

No comments:

Post a Comment