Basics of the "Unix v0" 'as' assembler syntax and PDP-7 coding:

ASSEMBLER SYNTAX:
=================

Anything from " to end of line is a comment.

. is the location counter, where the next code or constant will be placed
.. is the relocation counter (not currently handled)

Symbol names start with a letter, and can contain letters and "."
The dump of label values in "scans/sysmap" appears to show truncation
after 8 characters (not currently enforced by the "as7" Perl script).

symbol = expression

	Sets a built-in variable to this value without generating any
	machine code.

	as7 enters machine instructions and the indirect bit (i)
	into the variable table.

label:	is a label.

	single digit decimal numbers can be used as "local" labels,
	which are referenced with Nf and Nb:

	jmp 1b		Jump back to the closest 1: label
	jmp 1f		Jump forward to the closest 1: label

multiple words can be entered on one line, separated by semi-colon.

String literals seem to be two characters decorated with < > characters, e.g.
"ab" is <a>b. I'm guessing these are placed as pairs in one word. The syntax
is confusing, because I've also seen <no> ; 040 ; <fi> ; <le>; <s 012 which
appears to mean "no files\n".

A line can contain multiple labels, e.g. o12: d10: 10

Numeric literals: 0xx are octal values, [1-9]xx are decimal values.

CONVENTIONS:
============

Because there is no immediate mode (and "as" lacks the literal syntax
found in DEC assemblers), there is a convention for literal (manifest)
constants:

	dNNN indicates the location of a constant for decimal NNN.
	oOOO for octal constant OOO.
	dmN (decimal minus) indicates a constant for decimal -N.

High order bits of the "law" (load ac with word) instruction are all
ones, so "-1" as an instruction loads -1 into AC.

Subroutine arguments are often located in the words following the
call. When an argument is a variable, sometimes a "0:" local label is
used to tag the location:

	jms namei; 0:0

Indentation is (sometimes) used to indicate an instruction
or subroutine that may skip:

	jms betwen; o10000; o17762
	   jms error
	dac .+1

NOTE! the "betwen" (between) routine (which appears in multiple
places) takes ADDRESSES of values (which can be literals, as above),
or could be symbol names for variables.

I haven't seen any cases where "skip chains" (sequences of skip
instructions) are indicated by multiple indents.

There is no hardware stack, so the only form of subroutine call (jms)
is "impure" and leaves the (updated) program counter in the first word
of the subroutine, and execution starts on the second word.

Routines which fetch arguments from after the jms instruction may do:

	 routine: 0
	    lac routine i	" pick up argument after jms
	    dac temp		" store in temp location
	    isz routine		" increment return PC to skip argument

Routines which (optionally) skip (eg; on success) may use "isz
routine" to (conditionally) increment the return PC.
In the system code t is used as a (current) offset into a block of
temporary location(s) for a routine or group of routines , at label
9f, so you'll see:

    this:
	0
	lac 9f+t+N
    ....
    t=t+M
    ....
    that:
	0
	lac 9f+t+I
    ....
    t=t+J
    .....
    .....
    .....
    9f:
       .=.+t


HARDWARE:
=========

References:
http://simh.trailing-edge.com/docs/architecture18b.pdf
http://www.soemtron.org/pdp7.html
http://www.soemtron.org/pdp7history.html

NOTE!  All opcodes are defined in "sop.s" (and in "as7") and are
commented as below.

The PDP-4/7/9/15 family started as a simplified version of the PDP-1,
(itself an evolution of the TX-0, designed at MIT Lincoln Labs). The
PDP-4 wasn't very successful (offering 5/8 PDP-1 performance at 1/2
the price).

The PDP-7 was originally concieved of as a repackaging of the PDP-1,
but DEC had a built up more system software for the PDP-4 than for the
PDP-1 (including a FORTRAN II compiler!), so they continued with the
new architecture.   See Bob's paper (first link above) for more detail.

The Living Computing Museum in Seattle has a running PDP-7, and
intends to build simulated disk hardware to enable running "Unix v0".

Words are 18 bits, words are typically represented as six octal digits.

There is one 18 bit accumulator, called "AC".

The "LINK" register is a 1-bit register that is included in shifts.

The Extended Arithmetic Element (EAE) option adds an "MQ" register which
has limited uses.

Bit numbering is "big endian": bit zero is 400000.

There are no "addressing" modes: memory referencing instructions,
(opcodes 0 thru 060) decode the low 14 bits as:
    I (020000) "indirect" bit
    Y (017777) 13-bit address field.

When the "I" bit is clear, the "Y" field is the address of the operand.
When the "I" bit is set, the word referenced by the contents of the
word addressed by "Y" field is used as the operand.

Unlike the PDP-1 (and PDP-6/10) indirect references are not
multi-level, and end after the first indirect fetch.

The PDP-7 found by Ken Thompson apparently did not have extended
memory or memory protection options and could directly reference all
8K words of memory (at all times).

Unix system code appears to have resided in the low 4K of memory, and
a single user program in the high 4K.  The system had a "fast"
fixed-head disk, but indirect memory access locks out DMA by the disk
controller, and indirect access cannot be used while disk transfers
are active (the interrupt service routines for clock and TTY, are
coded without using indirect!).  Because of this, disk access cannot
be "overlapped" with user code, but users are free to use "indirect"
freely.

When locations 010 through 017 of memory are referenced *INDIRECTLY*,
they auto-increment after access, and can be used as "index
registers".

System calls are made using the "CAL" (call) instruction (octal 00).
The Y field indicates the system call number.  "CAL" behaves like "jms
020".  "CAL" with the "I" bit set behaves like "jms 20 i".  The system
handler appears to deal with both (although the non-indirect form
destroys the contents of location 020), and has a longer code path, so
it seems likely that "sys=cal i" became the preferred system call at
some point?

The kernel preserves the contents of the users' AC and MQ registers
and locations 8-15 in the "userdata" block (symbols u.ac, u.mq and
u.rq).  Location u.rq+8 is the saved PC of the last system call.

The "save" system call (and any undefined system call) write high 4K
of memory and the "userdata" block (to fd 1?????)

The system did not have advanced interrupt processing hardware, so all
"priority interrupts" (also known in the past as "address break"
processing) dispatched as if a "jms 0" was executed, and are processed
in the "pibreak" routine.

Summary of memory instructions (from annotated sop.s):

    dac = 0040000		" MEM: deposit AC
    jms = 0100000		" MEM: jump to subroutine
    dzm = 0140000		" MEM: deposit zero to memory
    lac = 0200000		" MEM: load AC
    xor = 0240000		" MEM: XOR with AC
    add = 0300000		" MEM: one's complement add
    tad = 0340000		" MEM: two's complement add
    xct = 0400000		" MEM: execute
    isz = 0440000		" MEM: increment and skip if zero
    and = 0500000		" MEM: AND with memory
    sad = 0540000		" MEM: skip if AC different
    jmp = 0600000		" MEM: jump

Other instruction groups do not interpret the "I" bit, and all begin
with '7' in the high three bits.

I/O Transfer (or IOT) have 111000 (070) in the high six bits. the next
six bits (two octal digits) indicate the device number.

Operate (OPR) which is "microcoded" with low order bits indicating
"micro operations" to be performed have 11110 (074) in the high four
bits:

    cma = 0740001		" OPR: complement AC
    ral = 0740010		" OPR: rotate AC left
    rar = 0740020		" OPR: rotate AC right
    hlt = 0740040		" OPR: halt
    sma = 0740100		" OPR: skip on minus AC
    sza = 0740200		" OPR: skip on zero AC
    snl = 0740400		" OPR: skip on non-zero link
    skp = 0741000		" OPR: skip unconditionally
    sna = 0741200		" OPR: skip on non-zero AC
    szl = 0741400		" OPR: skip on zero link
    rtl = 0742010		" OPR: rotate two left
    rtr = 0742020		" OPR: rotate two right
    cll = 0744000		" OPR: clear link
    rcl = 0744010		" OPR: clear link, rotate left
    rcr = 0744020		" OPR: clear link, rotate right
    cla = 0750000		" OPR: clear AC
    las = 0750004		" OPR: load AC from switches

With some limitations, OPR instructions can be OR-ed together.
(Ordering of operations is determined by the hardware, not by their
order in the source!!):

  sna cla		" skip on negative AC, clear AC
  sna spa		" skip on non-zero or positive AC(??)
  sna ral		" skip on non-zero AC, rotate AC left
  cla cll sza		" skip on AC zero, clear AC, clear LINK

The last "operate" instruction is not microcoded:
    law = 0760000		" OPR: load accumulator with (instr)

and as noted above, is often coded directly as an immediate negative constant.

Unix v0 depends on the EAE option, which uses instructions with 064 in
the top four bits, and is used for multiply, divide, and the 18-bit MQ
register:

    lrs = 0640500		" EAE: long right shift
    lrss = 0660500		" EAE: long right shift, signed

    lls = 0640600		" EAE: long left shift
    llss = 0660600		" EAE: long left shift, signed

    als = 0640700		" EAE: AC left shift
    alss = 0660700		" EAE: AC left shift, signed

    mul = 0653323		" EAE: multiply
    idiv = 0653323		" EAE: integer divide

    lacq = 0641002		" EAE: load AC with MQ
    clq = 0650000		" EAE: clear MQ
    omq = 0650002		" EAE: OR MQ into AC
    cmq = 0650004		" EAE: complement MQ
    lmq = 0652000		" EAE: load MQ from AC

