mixasm, the MIXAL assemblermixvm, the MIX computer simulatorgmixvm, the GTK virtual machinemixguile, the Scheme virtual machinemixasm, the MIXAL assembler
mixvm, the MIX computer simulator
gmixvm, the GTK virtual machine
mixguile, the Scheme virtual machine
This manual is for GNU MDK (version 1.2.4, March, 2008), a set of utilities for developing programs using Donald Knuth's MIX mythical computer and MIXAL, its assembly language.
Copyright © 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2008 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, with the Front-Cover Texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.(a) The FSF's Back-Cover Text is: “You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.”
GNU MDK was written and designed by Jose Antonio Ortega Ruiz.
Pieter E. J. Pareit is the author of the Emacs MIXAL mode
(see MIXAL mode), and has also contributed many bug fixes.
Philip E. King has contributed to this package development with many helpful discussions, as well as actual code (see GUD integration).
Michael Scholz is the author of the German translation of mdk's user interface.
--- The Detailed Node Listing ---
Installing mdk
MIX and MIXAL tutorial
The MIX computer
MIX instruction set
MIXAL
Getting started
Running the program
Using mixguile
Hook functions
Emacs tools
MIXAL mode
mixasm, the MIXAL assembler
mixvm, the MIX computer simulator
Interactive commands
gmixvm, the GTK virtual machine
mixguile, the Scheme virtual machine
Scheme functions reference
Copying
In his book series The Art of Computer Programming (published by Addison Wesley), D. Knuth uses an imaginary computer, the MIX, and its associated machine-code and assembly languages to ilustrate the concepts and algorithms as they are presented.
The MIX's architecture is a simplified version of those found in real CISC CPUs, and the MIX assembly language (MIXAL) provides a set of primitives that will be very familiar to any person with a minimum experience in assembly programming. The MIX/MIXAL definition is powerful and complete enough to provide a virtual development platform for writing quite complex programs, and close enough to real computers to be worth using when learning programming techniques. At any rate, if you want to learn or improve your programming skills, a MIX development environment would come in handy.
The mdk package aims at providing such virtual development environment on a GNU box. Thus, mdk offers you a set of utilities to simulate the MIX computer and to write, compile, run and debug MIXAL programs. As of version 1.2.4, mdk includes the following programs:
mixasmmixvm, mixguile
or gmixvm.
mixvmgmixvmmixvm
functionality accessible through a graphical interface.
mixguilemixal-mode.elmixvm
within Emacs.
mixvm.elmixvm inside an Emacs GUD
buffer, providing concurrent edition and debugging of MIXAL programs.
mixvm and gmixvm implement a simulator of the MIX
computer, giving you a virtual machine for executing and debugging MIX
programs. These binary programs could be written by hand, but it is
easier to produce them compiling MIXAL source files, using the MIXAL
assembler mixasm. On the other hand, mixguile offers you
the possibility of manipulating a MIX virtual machine through a set of
Scheme functions, so that you can use this programming language to
interact with the virtual machine. In addition, mixvm and
gmixvm are also able to interpret Scheme scripts (using an
embedded Guile interpreter), that is, you can use Scheme as an extension
language to add new functionalities to these programs.
This manual gives you a tutorial of MIX and MIXAL, and a thorough description of the use of the mdk utilities.
Many people have further contributed to mdk by reporting problems, suggesting various improvements, or submitting actual code. Here is a list of these people. Help me keep it complete and exempt of errors.
mixvm and influenced the gmixvm GUI
design with insightful comments and prototypes.
GNU mdk is distributed as a source tarball available for download in the following URLs:
The above sites contain the latest stable releases of mdk. The development branch is available as a GNU Arch archive located at1
After you have downloaded the source tarball, unpack it in a directory of your choice using the command:
tar xfvz mdk-X.Y.tar.gz
where X.Y stands for the downloaded version (the current stable release being version 1.2.4).
In order to build and install mdk, you will need the following libraries installed in your system:
If present, readline and history are used to provide command completion
and history management to the command line MIX virtual machine, mixvm.
GTK+ and libglade are needed if you want to build the graphical
interface to the MIX virtual machine, gmixvm. Finally, if
libguile is found, the mdk utilities will be compiled with Guile
support and will be extensible using Scheme.
Please note: you need both the libraries and the headers; this means both the library package and the -dev package if you do not compile your libraries yourself (ex: installing libgtk2.0-0 and libgtk2.0-0-dev on Debian).
mdk uses GNU Autoconf and Automake tools, and, therefore, should be built and installed without hassle using the following commands inside the source directory:
./configure
make
make install
where the last one must be run as root.
The first command, configure, will setup the makefiles for your
system. In particular, configure will look for GTK+ and libglade,
and, if they are present, will generate the appropiate makefiles for
building the gmixvm graphical user interface. Upon completion,
you should see a message with the configuration results like the
following:
*** GNU MDK 1.2 has been successfully configured. ***
Type 'make' to build the following utilities:
- mixasm (MIX assembler)
- mixvm (MIX virtual machine, with readline support,
with guile support)
- gmixvm (mixvm GTK+ GUI, with guile support)
- mixguile (the mixvm guile shell)
where the last lines may be missing if you lack the above mentioned libraries.
The next command, make, will actually build the mdk programs
in the following locations:
You can run these programs from within their directories, but I
recommend you to install them in proper locations using make
install from a root shell.
mdk includes extensive support for Emacs. Upon installation, all the elisp code is installed in PREFIX/share/mdk, where PREFIX stands for your installation root directory (e.g. /usr/local). You can copy the elisp files to a directory that is in your load-path, or you can add the above directory to it. Assuming that the installing prefix is /usr/local, you can do it by adding to your .emacs file the following line:
(setq load-path (cons "/usr/local/share/mdk" load-path))
MIXAL programs can be written using Emacs and the elisp program
share/mdk/mixal-mode.el, contributed by Pieter E. J. Pareit. It
provides font locking, interactive help, compiling assistance and
invocation of the MIX virtual machine via a new major mode
called mixal-mode. To start mixal-mode automatically
whenever you edit a MIXAL source file, add the following lines
to your .emacs file:
(autoload 'mixal-mode "mixal-mode" t)
(add-to-list 'auto-mode-alist '("\\.mixal\\'" . mixal-mode))
In addition, mixvm can be run within an Emacs GUD
buffer using the elisp program share/mdk/mixvm.el, contributed
by Philip E. King. mixvm.el provides an interface between
mdk's mixvm and Emacs, via GUD. Place this file
in your load-path, optionally adding the following line to your
.emacs file:
(autoload 'mixvm "mixvm" "mixvm/gud interaction" t)
You can fine-tune the configuration process using the following switches with configure:
Enables/disables the build of the MIX virtual machine GUI (
gmixvm). If the required libraries are missing (see Requirements) the configure script with automatically disable this feature.
Enables/disables the Guile support for
mixvmandgmixvm, and the build ofmixguile. If the required libraries are missing (see Requirements) the configure script with automatically disable this feature.
Enables/disables the GNU Readline support for
mixvm. If the required libraries are missing (see Requirements) the configure script with automatically disable this feature.
For additional, boilerplate configure options, see the INSTALL file, or run
configure --help
GNU MDK has been tested in the following platforms:
mdk will probably work on any GNU/Linux or BSD platform. If you try it in a platform not listed above, please send a mail to the author.
In the book series The Art of Computer Programming, by D. Knuth, a virtual computer, the MIX, is used by the author (together with the set of binary instructions that the virtual CPU accepts) to illustrate the algorithms and skills that every serious programmer should master. Like any other real computer, there is a symbolic assembler language that can be used to program the MIX: the MIX assembly language, or MIXAL for short. In the following subsections you will find a tutorial on these topics, which will teach you the basics of the MIX architecture and how to program a MIX computer using MIXAL.
In this section, you will find a description of the MIX computer, its components and instruction set.
The basic information storage unit in the MIX computer is the byte, which stores positive values in the range 0-63 . Note that a MIX byte can be then represented as 6 bits, instead of the common 8 bits for a regular byte. Unless otherwise stated, we shall use the word byte to refer to a MIX 6-bit byte.
A MIX word is defined as a set of 5 bytes plus a sign. The bytes within a word are numbered from 1 to 5, being byte number one the most significant one. The sign is denoted by index 0. Graphically,
-----------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 |
-----------------------------------------------
| +/- | byte | byte | byte | byte | byte |
-----------------------------------------------
Sample MIX words are `- 12 00 11 01 63' and `+ 12 11 34 43 00'.
You can refer to subfields within a word using a field
specification or fspec of the form “(L:R)”, where
L denotes the first byte, and R the last byte of the
subfield.
When L is zero, the subfield includes the word's
sign. An fspec can also be represented as a single value F, given
by F = 8*L + R (thus the fspec `(1:3)', denoting the first
three bytes of a word, is represented by the integer 11).
The MIX computer stores information in registers, that can store either a word or two bytes and sign (see below), and memory cells, each one containing a word. Specifically, the MIX computer has 4000 memory cells with addresses 0 to 3999 (i.e., two bytes are enough to address a memory cell) and the following registers:
rArXrJrI1, rI2, rI3, rI4, rI5, rI6In addition, the MIX computer contains:
un, where
n runs from 0 to 20. In Knuth's definition, u0 through
u7 are magnetic tape units, u8 through 15 are disks
and drums, u16 is a card reader, u17 is a card writer,
u18 is
a line printer and, u19 is a typewriter terminal, and u20,
a paper tape. Our implementation maps these devices to disk files,
except for u19, which represents the standard output.
As noted above, the MIX computer communicates with the external world by a set of input-output devices which can be “connected” to it. The computer interchanges information using blocks of words whose length depends on the device at hand (see Devices). These words are interpreted by the device either as binary information (for devices 0-16), or as representing printable characters (devices 17-20). In the last case, each MIX byte is mapped onto a character according to the following table:
| 00 | 01 | A | 02 | B | 03 | C
| |
| 04 | D | 05 | E | 06 | F | 07 | G
|
| 08 | H | 09 | I | 10 | ~ | 11 | J
|
| 12 | K | 13 | L | 14 | M | 15 | N
|
| 16 | O | 17 | P | 18 | Q | 19 | R
|
| 20 | [ | 21 | # | 22 | S | 23 | T
|
| 24 | U | 25 | V | 26 | W | 27 | X
|
| 28 | Y | 29 | Z | 30 | 0 | 31 | 1
|
| 32 | 2 | 33 | 3 | 34 | 4 | 35 | 5
|
| 36 | 6 | 37 | 7 | 38 | 8 | 39 | 9
|
| 40 | . | 41 | , | 42 | ( | 43 | )
|
| 44 | + | 45 | - | 46 | * | 47 | /
|
| 48 | = | 49 | $ | 50 | < | 51 | >
|
| 52 | @ | 53 | ; | 54 | : | 55 | '
|
~, [ and
# correspond to symbols not representable as ASCII characters
(uppercase delta, sigma and gamma, respectively), and byte values 56-63
have no associated character.
Finally, the MIX computer features a virtual CPU which controls the above components, and which is able to execute a rich set of instructions (constituting its machine language, similar to those commonly found in real CPUs), including arithmetic, logical, storing, comparison and jump instructions. Being a typical von Neumann computer, the MIX CPU fetchs binary instructions from memory sequentially (unless a jump instruction is found), and stores the address of the next instruction to be executed in an internal register called location counter (also known as program counter in other architectures).
The next section, See MIX instruction set, gives a complete description of the available MIX binary instructions.
The following subsections fully describe the instruction set of the MIX computer. We begin with a description of the structure of binary instructions and the notation used to refer to their subfields. The remaininig subsections are devoted to describing the actual instructions available to the MIX programmer.
MIX instructions are codified as words with the following subfield structure:
| Subfield | fspec | Description
|
| ADDRESS | (0:2) | The first two bytes plus sign are the address field. Combined
with the INDEX field, denotes the memory address to be used by the
instruction.
|
| INDEX | (3:3) | The third byte is the index, normally used for indexing the
address3.
|
| MOD | (4:4) | Byte four is used either as an operation code modifier or as a field
specification.
|
| OPCODE | (5:5) | The last (least significant) byte in the word denotes the operation
code.
|
or, graphically,
------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 |
------------------------------------------------
| ADDRESS | INDEX | MOD | OPCODE |
------------------------------------------------
For a given instruction, `M' stands for the memory address obtained after indexing the ADDRESS subfield (using its INDEX byte), and `V' is the contents of the subfield indicated by MOD of the memory cell with address `M'. For instance, suppose that we have the following contents of MIX registers and memory cells:
[rI2] = + 00 63
[31] = - 10 11 00 11 22
where `[n]' denotes the contents of the nth memory cell and `[rI2]' the contents of register `rI2'4. Let us consider the binary instruction `I = - 00 32 02 11 10'. For this instruction we have:
ADDRESS = - 00 32 = -32
INDEX = 02 = 2
MOD = 11 = (1:3)
OPCODE = 10
M = ADDRESS + [rI2] = -32 + 63 = 31
V = [M](MOD) = (- 10 11 00 11 22)(1:3) = + 00 00 10 11 00
Note that, when computing `V' using a word and an fspec, we apply a left padding to the bytes selected by `MOD' to obtain a complete word as the result.
In the following subsections, we will assign to each MIX instruction a mnemonic, or symbolic name. For instance, the mnemonic of `OPCODE' 10 is `LD2'. Thus we can rewrite the above instruction as
LD2 -32,2(1:3)
or, for a generic instruction:
MNEMONIC ADDRESS,INDEX(MOD)
Some instructions are identified by both the OPCODE and the MOD fields. In these cases, the MOD will not appear in the above symbolic representation. Also when ADDRESS or INDEX are zero, they can be omitted. Finally, MOD defaults to (0:5) (meaning the whole word).
The following instructions are used to load memory contents into a register.
LDArA <- V.
LDXrX <- V.
LDirIi <- V.
LDANrA <- -V.
LDXNrX <- -V.
LDiNrIi <- -V.
In all the above load instructions the `MOD' field selects the bytes of the memory cell with address `M' which are loaded into the requisite register (indicated by the `OPCODE'). For instance, the word `+ 00 13 01 27 11' represents the instruction
LD3 13,1(3:3)
^ ^ ^ ^
| | | |
| | | --- MOD = 27 = 3*8 + 3
| | --- INDEX = 1
| --- ADDRESS = 00 13
--- OPCODE = 11
Let us suppose that, prior to this instruction execution, the state of the MIX computer is the following:
[rI1] = - 00 01
[rI3] = + 24 12
[12] = - 01 02 03 04 05
As, in this case, `M = 13 + [rI1] = 12', we have
V = [M](3:3) = (- 01 02 03 04 05)(3:3)
= + 00 00 00 00 03
(note that the specified subfield is left-padded with null bytes to complete a word). Hence, the MIX state, after the instruction execution, will be
[rI1] = - 00 01
[rI3] = + 00 03
[12] = - 01 02 03 04 05
To further illustrate loading operators, the following table shows the contents of `rX' after different `LDX' instructions:
The following instructions are the inverse of the load operations: they are used to store a subfield of a register into a memory location. Here, MOD represents the subfield of the memory cell that is to be overwritten with bytes from a register. These bytes are taken beginning by the rightmost side of the register.
STAV <- rA.
STXV <- rX.
STiV <- rIi.
STJV <- rJ.
STZV <- 0.
By way of example, consider the instruction `STA 1200(2:3)'. It causes the MIX to fetch bytes no. 4 and 5 of register A and copy them to bytes 2 and 3 of memory cell no. 1200 (remember that, for these instructions, MOD specifies a subfield of the memory address). The other bytes of the memory cell retain their values. Thus, if prior to the instruction execution we have
[1200] = - 20 21 22 23 24
[rA] = + 01 02 03 04 05
we will end up with
[1200] = - 20 04 05 23 24
[rA] = + 01 02 03 04 05
As a second example, `ST2 1000(0)' will set the sign of `[1000]' to that of `[rI2]'.
The following instructions perform arithmetic operations between rA and rX register and memory contents.
ADDrA <- rA +V.
SUBrA <- rA - V.
MULrAX <- rA x V.
DIVrA <- rAX / V, rX <- reminder.
In all the above instructions, `[rA]' is one of the operands of the binary arithmetic operation, the other being `V' (that is, the specified subfield of the memory cell with address `M'), padded with zero bytes on its left-side to complete a word. In multiplication and division, the register `X' comes into play as a right-extension of the register `A', so that we are able to handle 10-byte numbers whose more significant bytes are those of `rA' (the sign of this 10-byte number is that of `rA': `rX''s sign is ignored).
Addition and substraction of MIX words can give rise to overflows, since the result is stored in a register with room to only 5 bytes (plus sign). When this occurs, the operation result modulo 1,073,741,823 (the maximum value storable in a MIX word) is stored in `rA', and the overflow toggle is set to TRUE.
In these instructions, `M' (the address of the instruction after indexing) is used as a number instead of as the address of a memory cell. Consequently, `M' can have any valid word value (i.e., it's not limited to the 0-3999 range of a memory address).
ENTArA <- M.
ENTXrX <- M.
ENTirIi <- M.
ENNArA <- -M.
ENNXrX <- -M.
ENNirIi <- -M.
INCArA <- rA + M.
INCXrX <- rX + M.
INCirIi <- rIi + M.
DECArA <- rA - M.
DECXrX <- rX - M.
DECirIi <- rIi - M.
In the above instructions, the subfield `ADDRESS' acts as an immediate (indexed) operand, and allow us to set directly the contents of the MIX registers without an indirection to the memory cells (in a real CPU this would mean that they are faster that the previously discussed instructions, whose operands are fetched from memory). So, if you want to store in `rA' the value -2000 (- 00 00 00 31 16), you can use the binary instruction + 31 16 00 03 48, or, symbolically,
ENNA 2000
Used in conjuction with the store operations (`STA', `STX', etc.), these instructions also allow you to set memory cells contents to concrete values.
Note that in these address transfer operators, the `MOD' field is not a subfield specificator, but serves to define (together with `OPCODE') the concrete operation to be performed.
So far, we have learned how to move values around between the MIX registers and its memory cells, and also how to perform arithmetic operations using these values. But, in order to write non-trivial programs, other functionalities are needed. One of the most common is the ability to compare two values, which, combined with jumps, will allow the execution of conditional statements. The following instructions compare the value of a register with `V', and set the cm indicator to the result of the comparison (i.e. to `E', `G' or `L', equal, greater or lesser respectively).
CMPACMPXCMPiAs explained above, these instructions modify the value of the MIX comparison indicator; but maybe you are asking yourself how do you use this value: enter jump operators, in the next subsection.
The MIX computer has an internal register, called the location
counter, which stores the address of the next instruction to be fetched
and executed by the virtual CPU. You cannot directly modify the contents
of this internal register with a load instruction: after fetching the
current instruction from memory, it is automatically increased in one
unit by the MIX. However, there is a set of instructions (which we call
jump instructions) which can alter the contents of the location counter
provided some condition is met. When this occurs, the value of the next
instruction address that would have been fetched in the absence of the
jump is stored in `rJ' (except for JSJ), and the location
counter is set to the value of `M' (so that the next instruction is
fetched from this new address). Later on, you can return to the point
when the jump occurred reading the address stored in `rJ'.
The MIX computer provides the following jump instructions: With these instructions you force a jump to the specified address. Use `JSJ' if you do not care about the return address.
JMPJSJThese instructions check the overflow toggle to decide whether to jump or not.
JOVJNOVIn the following instructions, the jump is conditioned to the contents of the comparison flag:
JL[CM] = L. OPCODE = 39, MOD = 4.
JE[CM] = E. OPCODE = 39, MOD = 5.
JG[CM] = G. OPCODE = 39, MOD = 6.
JGE[CM] does not equal L. OPCODE = 39, MOD = 7.
JNE[CM] does not equal E. OPCODE = 39, MOD = 8.
JLE[CM] does not equal G. OPCODE = 39, MOD = 9.
You can also jump conditioned to the value stored in the MIX registers, using the following instructions:
JANJAZJAPJANNJANZJANPJXNJXZJXPJXNNJXNZJXNPJiNJiZJiPJiNNJiNZJiNPAs explained in previous sections (see MIX architecture), the MIX computer can interact with a series of block devices. To that end, you have at your disposal the following instructions:
INOUTIOCJREDJBUSThe following instructions convert between numerical values and their character representations.
NUMCHAR [rA] = + 30 30 31 32 33
[rX] = + 31 35 39 30 34
the represented number is 0012315904, and `NUM' will store this value in `rA' (i.e., we end up with `[rA]' = + 0 46 62 52 0 = 12315904).
If any byte in `rA' or `rB' does not belong to the range 30-39, it is interpreted by `NUM' as the digit obtained by taking its value modulo 10. E.g. values 0, 10, 20, 30, 40, 50, 60 all represent the digit 0; 2, 12, 22, etc. represent the digit 2, and so on. For instance, the number 0012315904 mentioned above could also be represented as
[rA] = + 10 40 31 52 23
[rX] = + 11 35 49 20 54
`CHAR' performs the inverse operation, using only the values 30 to 39 for representing digits 0-9.
The following instructions perform byte-wise shifts of the contents of `rA' and `rX'.
SLASRASLAXSRAXSLCSRC| SLA 2 | [rA] = - 03 04 05 00 00
|
| SLA 6 | [rA] = - 00 00 00 00 00
|
| SRA 1 | [rA] = - 00 01 02 03 04
|
| SLC 3 | [rA] = + 04 05 06 07 08 | [rX] = - 09 10 01 02 03
|
| SLAX 3 | [rA] = + 04 05 06 07 08 | [rX] = - 09 10 00 00 00
|
| SRC 4 | [rA] = + 07 08 09 10 01 | [rX] = - 02 03 04 05 06
|
| SRAX 4 | [rA] = + 00 00 00 00 01 | [rX] = - 02 03 04 05 06
|
Finally, we list in the following table three miscellaneous MIX instructions which do not fit in any of the previous subsections:
MOVENOPHLTWhen writing MIXAL programs (or any kind of programs, for that matter), whe shall often be interested in their execution time. Loosely speaking, we will interested in the answer to the question: how long takes a program to execute? Of course, this execution time will be a function of the input size, and the answer to our question is commonly given as the asymptotic behaviour as a function of the input size. At any rate, to compute this asymptotic behaviour, we need a measure of how long execution of a single instruction takes in our (virtual) CPU. Therefore, each MIX instruction will have an associated execution time, given in arbitrary units (in a real computer, the value of this unit will depend on the hardware configuration). When our MIX virtual machine executes programs, it will (optionally) give you the value of their execution time based upon the execution time of each single instruction.
In the following table, the execution times (in the above mentioned arbitrary units) of the MIX instructions are given.
NOP | 1 | ADD | 2 | SUB
| 2 | MUL | 10
|
DIV | 12 | NUM | 10 | CHAR
| 10 | HLT | 10
|
SLx | 2 | SRx | 2 | LDx
| 2 | STx | 2
|
JBUS | 1 | IOC | 1 | IN
| 1 | OUT | 1
|
JRED | 1 | Jx | 1 | INCx
| 1 | DECx | 1
|
ENTx | 1 | ENNx | 1 | CMPx
| 1 | MOVE | 1+2F
|
In the above table, 'F' stands for the number of blocks to be moved
(given by the FSPEC subfield of the instruction); SLx and
SRx are a short cut for the byte-shifting operations; LDx
denote all the loading operations; STx are the storing
operations; Jx stands for all the jump operations, and so on with
the rest of abbreviations.
In the previous sections we have listed all the available MIX binary instructions. As we have shown, each instruction is represented by a word which is fetched from memory and executed by the MIX virtual CPU. As is the case with real computers, the MIX knows how to decode instructions in binary format (the so–called machine language), but a human programmer would have a tough time if she were to write her programs in machine language. Fortunately, the MIX computer can be programmed using an assembly language, MIXAL, which provides a symbolic way of writing the binary instructions understood by the imaginary MIX computer. If you have used assembler languages before, you will find MIXAL a very familiar language. MIXAL source files are translated to machine language by a MIX assembler, which produces a binary file (the actual MIX program) which can be directly loaded into the MIX memory and subsequently executed.
In this section, we describe MIXAL, the MIX assembly language. The implementation of the MIX assembler program and MIX computer simulator provided by mdk are described later on (see Getting started).
The MIX assembler reads MIXAL files line by line, producing, when required, a binary instruction, which is associated to a predefined memory address. To keep track of the current address, the assembler maintains an internal location counter which is incremented each time an instruction is compiled. In addition to MIX instructions, you can include in MIXAL file assembly directives (or pseudoinstructions) addressed at the assembler itself (for instance, telling it where the program starts and ends, or to reposition the location counter; see below).
MIX instructions and assembler directives6 are written in MIXAL (one per source file line) according to the following pattern:
[LABEL] MNEMONIC [OPERAND] [COMMENT]
where `OPERAND' is of the form
[ADDRESS][,INDEX][(MOD)]
Items between square brackets are optional, and
LABELMNEMONICLDA, STA; see see MIX instruction set) or an
assembly pseudoinstruction (e.g. ORG, EQU),
ADDRESSINDEXADDRESS is present,
MODOPCODE,
COMMENTNote that spaces are not allowed between the ADDRESS,
INDEX and MOD fields if they are present. White space is
used to separate the label, operation code and operand parts of the
instruction7.
We have already listed the mnemonics associated will each MIX instructions; sample MIXAL instructions representing MIX instructions are:
HERE LDA 2000 HERE represents the current location counter
LDX HERE,2(1:3) this is a comment
JMP 1234
MIXAL instructions can be either one of the MIX machine instructions (see MIX instruction set) or one of the following assembly pseudoinstructions:
ORIGEQUSYM EQU 2*200/3.
CONALFENDThe operand of ORIG, EQU, CON and END can be
any expression evaluating to a constant MIX word, i.e., either a simple
MIXAL expression (composed of numbers, symbols and binary operators,
see Expressions) or a w-expression (see W-expressions).
All MIXAL programs must contain an END directive, with a twofold
end: first, it marks the end of the assembler job, and, in the second
place, its (mandatory) operand indicates the start address for the
compiled program (that is, the address at which the virtual MIX machine
must begin fetching instructions after loading the program). It is also
very common (although not mandatory) to include at least an ORIG
directive to mark the initial value of the assembler's location counter
(remember that it stores the address associated with each compiled MIX
instruction). Thus, a minimal MIXAL program would be
ORIG 2000 set the initial compilation adress
NOP this instruction will be loaded at adress 2000
HLT and this one at address 2001
END 2000 end of program; start at address 2000
this line is not parsed by the assembler
The assembler will generate two binary instructions (NOP (+ 00 00 00 00 00) and HLT (+ 00 00 02 05)), which will be loaded at
addresses 2000 and 2001. Execution of the program will begin at address
2000. Every MIXAL program should also include a HLT instruction,
which will mark the end of program execution (but not of program
compilation).
The EQU directive allows the definition of symbolic names for
specific values. For instance, we could rewrite the above program as
follows:
START EQU 2000
ORIG START
NOP
HLT
END START
which would give rise to the same compiled code. Symbolic constants (or
symbols, for short) can also be implicitly defined placing them in the
LABEL field of a MIXAL instruction: in this case, the assembler
assigns to the symbol the value of the location counter before compiling
the line. Hence, a third way of writing our trivial program is
ORIG 2000
START NOP
HLT
END START
The CON directive allows you to directly specify the contents of
the memory address pointed by the location counter. For instance, when
the assembler encounters the following code snippet
ORIG 1150
CON -1823473
it will assign to the memory cell number 1150 the contents - 00 06 61 11 49 (which corresponds to the decimal value -1823473).
Finally, the ALF directive let's you specify the memory contents
as a set of five (optionally quoted) characters, which are translated by
the assembler to their byte values, conforming in that way the binary
word that is to be stored in the corresponding memory cell. This
directive comes in handy when you need to store printable messages in a
memory address, as in the following example 8:
OUT MSG MSG is not yet defined here (future reference)
MSG ALF "THIS " MSG gets defined here
ALF "IS A "
ALF "MESSA"
ALF "GE. "
The above snippet also shows the use of a future reference, that
is, the usage of a symbol (MSG in the example) prior of its actual
definition. The MIXAL assembler is able to handle future references
subject to some limitations which are described in the following section
(see Expressions).
Any line starting with an asterisk is treated as a comment and ignored by the assembler.
* This is a comment: this line is ignored.
* This line is an error: * must be in column 1.
As noted in the previous section, comments can also be located after the
OPERAND field of an instruction, separated from it by white
space, as in
LABEL LDA 100 This is also a comment
The ADDRESS, INDEX and MOD fields of a MIXAL
instruction can be expressions, formed by numbers, identifiers and
binary operators (+ - * / // :). + and - can also
be used as unary operators. Operator precedence is from left to right:
there is no other operator precedence rule, and parentheses cannot be
used for grouping. A stand-alone asterisk denotes the current memory
location; thus, for instance,
4+2**
evaluates to 6 (4 plus 2) times the current memory location. White space is not allowed within expressions.
The special binary operator : has the same meaning as in fspecs,
i.e.,
A:B = 8*A + B
while A//B stands for the quotient of the ten-byte number A 00 00 00 00 00 (that is, A right-padded with 5 null bytes or, what amounts
to the same, multiplied by 64 to the fifth power) divided by
B. Sample expressions are:
18-8*3 = 30
14/3 = 4
1+3:11 = 4:11 = 43
1//64 = (01 00 00 00 00 00)/(00 00 00 01 00) = (01 00 00 00 00)
Note that all MIXAL expressions evaluate to a MIX word (by definition).
All symbols appearing within an expression must be previously defined. Future
references are only allowed when appearing standalone (or modified by
an unary operator) in the ADDRESS part of a MIXAL instruction,
e.g.
* OK: stand alone future reference
STA -S1(1:5)
* ERROR: future reference in expression
LDX 2-S1
S1 LD1 2000
Besides expressions, as described above (see Expressions), the MIXAL
assembler is able to handle the so called w-expressions as the
operands of the directives ORIG, EQU, CON and
END (see MIXAL directives). The general form of a
w-expression is the following:
WEXP = EXP[(EXP)][,WEXP]
where EXP stands for an expression and square brackets denote
optional items. Thus, a w-expression is made by an expression, followed
by an optional expression between parenthesis, followed by any number
of similar constructs separated by commas. Sample w-expressions are:
2000
235(3)
S1+3(S2),3000
S1,S2(3:5),23
W-expressions are evaluated from left to right as follows:
As a second example, in the w-expression
1(1:2),66(4:5)
we first take two bytes from 1 (00 and 01) and store them as bytes 1 and 2 of the result (obtaining `+ 00 01 00 00 00') and, afterwards, take two bytes from 66 (01 and 02) and store them as bytes 4 and 5 of the result, obtaining `+ 00 01 00 01 02' (262210). The process is repeated for each new comma-separated example. For instance:
1(1:1),2(2:2),3(3:3),4(4:4) = 01 02 03 04 00
As stated before, w-expressions can only appear as the operands of MIXAL
directives taking a constant value (ORIG, EQU, CON
and END). Future references are not allowed within
w-expressions (i.e., all symbols appearing in a w-expression must be
defined before it is used).
Besides user defined symbols, MIXAL programmers can use the so called
local symbols, which are symbols of the form [1-9][HBF]. A
local symbol nB refers to the address of the last previous
occurrence of nH as a label, while nF refers to the next
nH occurrence. Unlike user defined symbols, nH can appear
multiple times in the LABEL part of different MIXAL
instructions. The following code shows an instance of local symbols'
usage:
* line 1
1H LDA 100
* line 2: 1B refers to address of line 1, 3F refers to address of line 4
STA 3F,2(1B//2)
* line 3: redefinition of 1H
1H STZ
* line 4: 1B refers to address of line 3
3H JMP 1B
Note that a B local symbol never refers to a definition in its
own line, that is, in the following program:
ORIG 1999
ST NOP
3H EQU 69
3H ENTA 3B local symbol 3B refers to 3H in previous line
HLT
END ST
the contents of `rA' is set to 69 and not to 2001. An
specially tricky case occurs when using local symbols in conjunction
with ORIG pseudoinstructions. To wit9,
ORIG 1999
ST NOP
3H CON 10
ENT1 *
LDA 3B
** rI1 is 2001, rA is 10. So far so good!
3H ORIG 3B+1000
** at this point 3H equals 2003
** and the location counter equals 3000.
ENT2 *
LDX 3B
** rI2 contains 3000, rX contains 2003.
HLT
END ST
MIXAL allows the introduction of literal constants, which are
automatically stored in memory addresses after the end of the program by
the assembler. Literal constants are denoted as =wexp=, where
wexp is a w-expression (see W-expressions). For instance, the
code
L EQU 5
LDA =20-L=
causes the assembler to add after the program's end an instruction
with contents 15 (`20-L'), and to assemble the above code as the
instruction LDA a, where a stands for the address
in which the value 15 is stored. In other words, the compiled code is
equivalent to the following:
L EQU 5
LDA a
...
a CON 20-L
END start
In this chapter, you will find a sample code-compile-run-debug session using the mdk utilities. Familiarity with the MIX mythical computer and its assembly language MIXAL (as described in Knuth's TAOCP) is assumed; for a compact reminder, see MIX and MIXAL tutorial.
MIXAL programs can be written as ASCII files with your editor of choice. Here you have the mandatory hello world as written in the MIXAL assembly language:
* (1)
* hello.mixal: say 'hello world' in MIXAL (2)
* (3)
* label ins operand comment (4)
TERM EQU 19 the MIX console device number (5)
ORIG 1000 start address (6)
START OUT MSG(TERM) output data at address MSG (7)
HLT halt execution (8)
MSG ALF "MIXAL" (9)
ALF " HELL" (10)
ALF "O WOR" (11)
ALF "LD " (12)
END START end of the program (13)
MIXAL source files should have the extension .mixal when used with the mdk utilities. As you can see in the above sample, each line in a MIXAL file can be divided into four fields separated by an arbitrary amount of whitespace characters (blanks and or tabs). While in Knuth's definition of MIXAL each field must start at a fixed pre-defined column number, the mdk assembler loosens this requirement and lets you format the file as you see fit. The only restrictions retained are for comment lines (like 1-4) which must begin with an asterisk (*) placed at column 1, and for the label field (see below) which, if present, must also start at column 1. The four fields in each non-comment line are:
START and MSG in lines 7 and 9) or a defined symbol
(TERM) (if present, the label must always start at the first
column in its line, for the first whitespace in the line maks the
beginning of the second field),
OUT and HLT in lines 7 and 8 above), or an assembly
pseudoinstruction (e.g., the ORIG pseudoinstruction in line
610.
Lines 9-12 of the hello.mixal file above also show the
second (and last) difference between Knuth's MIXAL definition and ours:
the operand of the ALF pseudoinstruction (a word of five
characters) must be quoted using ""11.
The workings of this sample program should be straightforward if you are familiar with MIXAL. See TAOCP vol. 1 for a thorough definition or MIX and MIXAL tutorial, for a tutorial.
Three simulators of the MIX computer, called mixvm, gmixvm
and mixguile, are included in the mdk tools. They are able to
run binary files containing MIX instructions written in their binary
representation. You can translate MIXAL source files into this binary
form using mixasm, the MIXAL assembler. So, in order to compile
the hello.mixal file, you can type the following command at your
shell prompt:
mixasm hello <RET>
If the source file contains no errors, this will produce a binary file
called hello.mix which can be loaded and run by the MIX virtual
machine. Unless the mixasm option -O is provided, the
assembler will include debug information in the executable file (for a
complete description of all the compilation options, see
mixasm). Now, your are ready to run your first MIX program, as
described in the following section.
MIX is a mythical computer, so it is no use ordering it from your favorite hardware provider. mdk provides three software simulators of the computer, though. They are
mixvm, a command line oriented simulator,
gmixvm, a GTK based graphical interface to mixvm, and
mixguile, a Guile shell with a built-in MIX simulator.
All three simulators accept the same set of user commands, but offer a
different user interface, as noted above. In this section we shall
describe some of these commands, and show you how to use them from
mixvm's command line. You can use them as well at gmixvm's
command prompt (see gmixvm), or using the built-in Scheme primitives
of mixguile (see Using mixguile).
Using the MIX simulators, you can run your MIXAL programs, after
compiling them with mixasm into binary .mix
files. mixvm can be used either in interactive or
non-interactive mode. In the second case, mixvm will load
your program into memory, execute it (producing any output due to
MIXAL OUT instructions present in the program), and exit when
it encounters a HLT instruction. In interactive mode, you will
enter a shell prompt which allows you issuing commands to the running
virtual machine. This commands will permit you to load, run and debug
programs, as well as to inspect the MIX computer state (register
contents, memory cells contents and so on).
To make mixvm work in non-interactive mode, use the -r
flag. Thus, to run our hello.mix program, simply type
mixvm -r hello <RET>
at your command prompt, and you will get the following output:
MIXAL HELLO WORLD
Since our hello world program uses MIX's device number 19 as
its output device (see Writing a source file), the output is
redirected to the shell's standard output. Had you used any other MIX
output devices (disks, drums, line printer, etc.), mixvm would
have created a file named after the device used (e.g. disk4.dev)
and written its output there12.
The virtual machine can also report the execution time of the program,
according to the (virtual) time spent in each of the binary instructions
(see Execution times). Printing of execution time statistics is
activated with the -t flag; running
mixvm -t -r hello <RET>
produces the following output:
MIXAL HELLO WORLD
** Execution time: 11
Sometimes, you will prefer to store the results of your program in MIX
registers rather than writing them to a device. In such cases,
mixvm's -d flag is your friend: it makes mixvm to
dump the contents of its registers and flags after executing the loaded
program. For instance, typing the following command at your shell's
prompt
mixvm -d -r hello
you will obtain the following output:
MIXAL HELLO WORLD
rA: + 00 00 00 00 00 (0000000000)
rX: + 00 00 00 00 00 (0000000000)
rJ: + 00 00 (0000)
rI1: + 00 00 (0000) rI2: + 00 00 (0000)
rI3: + 00 00 (0000) rI4: + 00 00 (0000)
rI5: + 00 00 (0000) rI6: + 00 00 (0000)
Overflow: F
Cmp: E
which, in addition to the program's outputs and execution time, gives you the contents of the MIX registers and the values of the overflow toggle and comparison flag (admittedly, rather uninteresting in our sample).
As you can see, running programs non-interactively has many limitations. You cannot peek the virtual machine's memory contents, not to mention stepping through your program's instructions or setting breakpoints13. Enter interactive mode.
To enter the MIX virtual machine interactive mode, simply type
mixvm <RET>
at your shell command prompt. This command enters the
mixvm command shell. You will be presented the following command
prompt:
MIX >
The virtual machine is initialised and ready to accept your
commands. The mixvm command shell uses GNU's readline, so that
you have at your disposal command completion (using <TAB>) and
history functionality, as well as other line editing shortcuts common to
all utilities using this library (for a complete description of
readline's line editing usage, see Command Line Editing.)
Usually, the first thing you will want to do is loading a compiled MIX
program into memory. This is acomplished by the load command,
which takes as an argument the name of the .mix file to be
loaded. Thus, typing
MIX > load hello <RET>
Program loaded. Start address: 3000
MIX >
will load hello.mix into the virtual machine's memory
and set the program counter to the address of the first instruction. You
can obtain the contents of the program counter using the command
pc:
MIX > pc
Current address: 3000
MIX >
After loading it, you are ready to run the program, using, as you surely
have guessed, the run command:
MIX > run
Running ...
MIXAL HELLO WORLD
... done
Elapsed time: 11 /Total program time: 11 (Total uptime: 11)
MIX >
Note that now the timing statistics are richer. You obtain the
elapsed execution time (i.e., the time spent executing instructions
since the last breakpoint), the total execution time for the program up
to now (which in our case coincides with the elapsed time, since there
were no breakpoints), and the total uptime for the virtual machine (you
can load and run more than one program in the same
session)14. After
running the program, the program counter will point to the address after
the one containing the HLT instruction. In our case, asking the
value of the program counter after executing the program will give us
MIX > pc
Current address: 3002
MIX >
You can check the contents of a memory cell giving its address
as an argument of the command pmem, like this
MIX > pmem 3001
3001: + 00 00 00 02 05 (0000000133)
MIX >
and convince yourself that address 3001 contains the binary
representation of the instruction HLT. An address range of the
form FROM-TO can also be used as the argument of pmem:
MIX > pmem 3000-3006
3000: + 46 58 00 19 37 (0786957541)
3001: + 00 00 00 02 05 (0000000133)
3002: + 14 09 27 01 13 (0237350989)
3003: + 00 08 05 13 13 (0002118477)
3004: + 16 00 26 16 19 (0268542995)
3005: + 13 04 00 00 00 (0219152384)
3006: + 00 00 00 00 00 (0000000000)
MIX >
In a similar manner, you can look at the contents of the MIX registers and flags. For instance, to ask for the contents of the A register you can type
MIX > preg A
rA: + 00 00 00 00 00 (0000000000)
MIX >
Use the comand help to obtain a list of all available commands,
and help COMMAND for help on a specific command, e.g.
MIX > help run
run Run loaded or given MIX code file. Usage: run [FILENAME]
MIX >
For a complete list of commands available at the MIX propmt, See mixvm. In the following subsection, you will find a quick tour over commands useful for debugging your programs.
The interactive mode of mixvm lets you step by step execution of
programs as well as breakpoint setting. Use next to step through
the program, running its instructions one by one. To run our
two-instruction hello.mix sample you can do the following:
MIX > load hello
Program loaded. Start address: 3000
MIX > pc
Current address: 3000
MIX > next
MIXAL HELLO WORLD
Elapsed time: 1 /Total program time: 1 (Total uptime: 1)
MIX > pc
Current address: 3001
MIX > next
End of program reached at address 3002
Elapsed time: 10 /Total program time: 11 (Total uptime: 11)
MIX > pc
Current address: 3002
MIX > next
MIXAL HELLO WORLD
Elapsed time: 1 /Total program time: 1 (Total uptime: 12)
MIX >
MIX > run
Running ...
... done
Elapsed time: 10 /Total program time: 11 (Total uptime: 22)
MIX >
(As an aside, the above sample also shows how the virtual machine handles cummulative time statistics and automatic program restart).
You can set a breakpoint at a given address using the command
sbpa (set breakpoint at address). When a breakpoint is set,
run will stop before executing the instruction at the given
address. Typing run again will resume program execution. Coming
back to our hello world example, we would have:
MIX > sbpa 3001
Breakpoint set at address 3001
MIX > run
Running ...
MIXAL HELLO WORLD
... stopped: breakpoint at line 8 (address 3001)
Elapsed time: 1 /Total program time: 1 (Total uptime: 23)
MIX > run
Running ...
... done
Elapsed time: 10 /Total program time: 11 (Total uptime: 33)
MIX >
Note that, since we compiled hello.mixal with debug info
enabled, the virtual machine is able to tell us the line in the
source file corresponding to the breakpoint we are setting. As a
matter of fact, you can directly set breakpoints at source code lines
using the command sbp LINE_NO, e.g.
MIX > sbp 4
Breakpoint set at line 7
MIX >
sbp sets the breakpoint at the first meaningful source code line;
thus, in the above example we have requested a breakpoint at a line
which does not correspond to a MIX instruction and the breakpoint is set
at the first line containing a real instruction after the given one. To
unset breakpoi