Reality Signal Processor/Interface

The RSP interface is accessed by VR4300 via memory mapped registers at the physical address 0x040x xxxx.

DMEM and IMEM
Both RSP memory banks are fully memory mapped into the VR4300 address space, as follows: Accesses are usually performed using 32-bit reads and writes by VR4300. 16-bit and 8-bit reads work correctly. 16-bit and 8-bit writes behave in a non-standard way, as they affect the whole 32-bit word they are written to: higher bits are sign-extended, while lower bits are reset to 0. For instance, writing the 8-bit value  at offset 2 in DMEM has the same effect as writing the 32-bit word   to offset 0 in DMEM.

Since the memory is single-port, it can only be accessed by either the VR4300 or the RSP itself at the same time (including its internal DMA engine). Notice that there is no bus arbiter: an access happening at the same time by both processors will cause problems: typically what happens is that VR4300 wins the race, so the RSP write is lost, or the RSP read returns the same data read by the VR4300 (even if the address was different). Also, if a DMA was in progress, the address of the memory access performed by VR4300 becomes the current address of the DMA transfer, corrupting it. So, in general, VR4300 should access DMEM/IMEM only when RSP is halted.

DMA transfers
DMA transfers can be initiated by either VR4300 or RSP. They can transfer from/to RDRAM to/from IMEM/DMEM very efficiently, much faster than copying the data word by word using VR4300 over the memory mapped addresses of the memory banks. The speed of transfer is about 3.7 bytes per VR4300 (PClock) cycle (plus some small fixed overhead). It is the fastest DMA engine in the N64.

The DMA engine allows to transfer multiple "rows" of data in RDRAM, separated by a "skip" value. This allows for instance to transfer a rectangular portion of a larger image, by specifying the size of each row of the selection portion, the number of rows, and a "skip" value that corresponds to the bytes between the end of a row and the beginning of the following one. Notice that this applies only to RDRAM: accesses in IMEM/DMEM are always linear.

All DMA registers are double-buffered: this means that it is possible to program a DMA transfer while another one is in progress. As soon as the first transfer finishes, the second one will start. The RSP status register reports in separate bits whether there is a transfer ongoing, and whether there is a transfer pending.

DMA transfers only happen between 8-byte aligned addresses (in both RDRAM and IMEM/DMEM). DMA registers do not allow misaligned addresses to be written, as the lowest 3 bits are ignored and fixed to 0. The same applies to the length registers, so that the transfer size is always a multiple of 8.

RSP Internal Registers
The internal RSP registers are memory mapped into the VR4300 physical address space starting from 0x0404 0000. Normally, accesses are performed through the virtual uncached segment, so at 0xA404 0000.

The exact same physical registers are also exposed as COP0 registers to RSP itself, and can thus be accessed using the MTC0 / MFC0 opcodes. Since access to all registers is shared by VR4300 and RSP, special care must be taken while writing software to decide who is in charge of each different resource / feature. For instance, normally DMA operations are performed by either the CPU or the RSP only; if the software architecture requires both to issue DMA transfers, some kind of mutex protocol must be established (for instance, using either the SIG bits in the SP_STATUS register, or the SP_SEMAPHORE register).

SP_DMA_SPADDR

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="3" | — || MEM_BANK || colspan="4" | MEM_ADDR[11:8]
 * colspan="3" | — || MEM_BANK || colspan="4" | MEM_ADDR[11:8]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || U-0 || U-0 || U-0
 * colspan="5" | MEM_ADDR[7:3] || 0 || 0 || 0
 * colspan="5" | MEM_ADDR[7:3] || 0 || 0 || 0

Extra Details:
 * MEM_BANK
 * This bit selects the memory bank that will be accessed by the DMA transfer. Notice that, even though the memory banks appear to be contiguous in VR4300 address space, it is not possible to perform a single DMA transfer that spans across two banks. Each transfer will only access a single bank. For instance, to load a microcode, it is normally necessary to do two separate transfers: one for IMEM and one for DMEM.
 * MEM_ADDR
 * This field contains the address in SP memory where the DMA transfer begins. The address is always aligned to 8 bytes, as the lowest 3 bits cannot be written. Notice that after writing to this register, the value is latched by SP but it is kept "pending" until the transfer is initiated via writes to  or  . Reads will continue returning the current (non-pending) value that refers to either an ongoing DMA transfer, or the last finished one. After a DMA transfer is finished, reading this register contains the address after the last one that was written.

SP_DMA_RAMADDR

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="8" | DRAM_ADDR[23:16]
 * colspan="8" | DRAM_ADDR[23:16]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="8" | DRAM_ADDR[15:8]
 * colspan="8" | DRAM_ADDR[15:8]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || U-0 || U-0 || U-0
 * colspan="5" | DRAM_ADDR[7:3] || 0 || 0 || 0
 * colspan="5" | DRAM_ADDR[7:3] || 0 || 0 || 0

Extra Details:
 * DRAM_ADDR
 * This field contains the address in RDRAM memory where the DMA transfer begins. The address is always aligned to 8 bytes, as the lowest 3 bits cannot be written. Notice that after writing to this register, the value is latched by SP but it is kept "pending" until the transfer is initiated via writes to  or  . Reads will continue returning the current (non-pending) value that refers to either an ongoing DMA transfer, or the last finished one. After a DMA transfer is finished, reading this register contains the address after the last one that was written.

SP_DMA_RDLEN
This register is used to initiate a DMA transfer from RDRAM to DMEM/IMEM. It must be written as third register, after programming  and. As soon as it is written, if the DMA engine was idle, a DMA transfer is started. Otherwise, the DMA transfer is enqueued (double-buffered), waiting for the previous one to be finished.


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="8" | SKIP[11:4]
 * colspan="8" | SKIP[11:4]


 * RW-0 || U-0 || U-0 || U-0 || RW-0 || RW-0 || RW-0 || RW-0
 * SKIP[3] || 0 || 0 || 0 || colspan="4" | COUNT[7:4]
 * SKIP[3] || 0 || 0 || 0 || colspan="4" | COUNT[7:4]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="4" | COUNT[3:0] || colspan="4" | RDLEN[11:8]
 * colspan="4" | COUNT[3:0] || colspan="4" | RDLEN[11:8]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || U-0 || U-0 || U-0
 * colspan="5" | RDLEN[7:3] || 0 || 0 || 0
 * colspan="5" | RDLEN[7:3] || 0 || 0 || 0

Extra Details:
 * RDLEN
 * Like other DMA transfers in N64, this field holds the number of bytes to transfer minus 1. Since the DMA engine works in 64-bit words, writing 0 (or any value up to and including 6) starts a transfer of exactly 8 bytes. After the DMA transfer is finished, this field contains the value ; the reason is that the field is internally decremented by 8 for each transferred word, so the final value will be   (in hex,.
 * COUNT and SKIP
 * Setting  to 0 initiates a linear transfer of   plus 1 bytes (rounded up to 8 bytes); in this case, the value of   is effectively ignored as only one row is transferred. With any other value,   indicates the number of rows, to transfer a portion of a rectangular image, and   indicates the so-called row stride, that is number of bytes to add to jump from the end of a row to the beginning of next one. After a DMA transfer is finished,   is reset to 0, and   is unchanged.

SP_DMA_WRLEN
This register is used to initiate a DMA transfer from DMEM/IMEM to RDRAM. It must be written as third register, after programming  and. As soon as it is written, if the DMA engine was idle, a DMA transfer is started. Otherwise, the DMA transfer is enqueued (double-buffered), waiting for the previous one to be finished.


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="8" | SKIP[11:4]
 * colspan="8" | SKIP[11:4]


 * RW-0 || U-0 || U-0 || U-0 || RW-0 || RW-0 || RW-0 || RW-0
 * SKIP[3] || 0 || 0 || 0 || colspan="4" | COUNT[7:4]
 * SKIP[3] || 0 || 0 || 0 || colspan="4" | COUNT[7:4]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="4" | COUNT[3:0] || colspan="4" | WRLEN[11:8]
 * colspan="4" | COUNT[3:0] || colspan="4" | WRLEN[11:8]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || U-0 || U-0 || U-0
 * colspan="5" | WRLEN[7:3] || 0 || 0 || 0
 * colspan="5" | WRLEN[7:3] || 0 || 0 || 0

Extra Details: Please refer to  for details.

SP_STATUS
The SP_STATUS register is the main status register for the RSP. Like many other flag registers in N64, it has two different layouts when accessed for reading and writing: this allows to perform atomic set / clear operations on each flag using a simple memory write operation, without risking race conditions that would be frequent if a read-modify-write sequence was issued by the processor.


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || R-0 || R-0 || R-0 || R-0 || R-0 || R-0 || R-0
 * — || SIG7 || SIG6 || SIG5 || SIG4 || SIG3 || SIG2 || SIG1
 * — || SIG7 || SIG6 || SIG5 || SIG4 || SIG3 || SIG2 || SIG1


 * R-0 || R-0 || R-0 || R-0 || R-0 || R-0 || R-0 || R-0
 * SIG0 || INTBREAK || SSTEP || IO_BUSY || DMA_FULL || DMA_BUSY || BROKE || HALTED
 * SIG0 || INTBREAK || SSTEP || IO_BUSY || DMA_FULL || DMA_BUSY || BROKE || HALTED


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || W-0
 * — || — || — || — || — || — || — || SET_SIG7
 * — || — || — || — || — || — || — || SET_SIG7


 * W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0
 * CLR_SIG7 || SET_SIG6 || CLR_SIG6 || SET_SIG5 || CLR_SIG5 || SET_SIG4 || CLR_SIG4 || SET_SIG3
 * CLR_SIG7 || SET_SIG6 || CLR_SIG6 || SET_SIG5 || CLR_SIG5 || SET_SIG4 || CLR_SIG4 || SET_SIG3


 * W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0
 * CLR_SIG3 || SET_SIG2 || CLR_SIG2 || SET_SIG1 || CLR_SIG1 || SET_SIG0 || CLR_SIG0 || SET_INTBREAK
 * CLR_SIG3 || SET_SIG2 || CLR_SIG2 || SET_SIG1 || CLR_SIG1 || SET_SIG0 || CLR_SIG0 || SET_INTBREAK


 * W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0
 * CLR_INTBREAK || SET_SSTEP || CLR_SSTEP || SET_INTR || CLR_INTR || CLR_BROKE || SET_HALT || CLR_HALT
 * CLR_INTBREAK || SET_SSTEP || CLR_SSTEP || SET_INTR || CLR_INTR || CLR_BROKE || SET_HALT || CLR_HALT

Extra Details:
 * HALT
 * The HALT flag can be thought of as a "pause" flag. When the RSP is halted by writing the  bit, the RSP core pauses the pipeline without flushing it, maintaining the current PC but also the intermediate status like pending writebacks and delay slots. If a new ucode is loaded instead, make sure to also write SP_PC to the new entry point (writing to SP_PC also fully discards the RSP core pipeline). Given the "pause" behavior, it would look like the VR4300 could pause and unpause the RSP at any time during its execution without side effects. Unfortunately, this only works "most" of the time: there is at least one hardware bug that can cause corruption when a halt is triggered within a specific sequence of opcodes. This bug has been observed during libdragon development, but the developers could not manage to isolate it or reduce it to a small snippet of code. In general, it is thus very risky to prepare a communication protocol that comprehends VR4300 pausing/unpausing the RSP at random times while it is running.
 * HALT and DMA
 * Setting the HALT bit does not pause the DMA transfers in progress. The DMAs will continue running until they finish. This will be reflected by the status register, so that it is possible that both  and   are set. This is specifically important if VR4300 halts the RSP and wants to access IMEM/DMEM immediately: it is important to wait for   to be cleared before accessing the memory banks, or corruption can happen (see above for more information on what happens when both VR4300 and RSP access the memory banks at the same time).
 * SSTEP
 * The single step mode allows the RSP to execute a single instruction and pause itself. In particular, whenever  is set while the RSP is running, RSP will pause itself before next instruction by setting the   flag. To perform single-stepping through RSP code, VR4300 should set the   flag and then reset   to execute exactly one instruction. Unfortunately, this hardware mode is very buggy. There are at least two specific bugs that have been isolated. The presence of these two bugs are enough to consider the feature broken beyond any expectation of being useful.
 * Conditional branch instructions are sometimes broken; that is a branch is taken where it should not or viceversa
 * /  are broken: the instructions have a 2-cycle latency but it looks like the pending writeback is lost in single step mode;   actually writes   into the register (where   is the address of the   instruction itself).   simply doesn't work, and the target register is not written.
 * SIG
 * Signal bits are software-controlled bits with no hardware meaning. They can be set or reset by writing to the status register. Since both VR4300 and RSP can access the status register, they can be used to perform a simple communication / handshaking protocol between the two CPUs. For instance the RSP might set  to   when some data has been processed and sent back to RDRAM via DMA, so that VR4300 can access the results after it sees   being set. Because of the design of the hardware register that allows for atomic modification of bits thanks to the separate write access structure, it is possible for VR4300 and RSP to set/reset different signal bits at the same time without risking race conditions.
 * DMA_FULL
 * This bit is set whenever a DMA transfer is pending, that is it has been programmed via the DMA registers but it has not started yet because another transfer is in progress. This is possible because of the double-buffering of the DMA registers (explained above). Notice that this bit goes to  a few clock cycles *before* the previous DMA transfer is finished (probably the RSP internally has some preparation work for DMA that is able to parallelize with the last memory writes of another transfer). Anyway, as soon as the bit goes to zero, it is possible to enqueue a new DMA transfer.

SP_DMA_FULL

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || R-0
 * — || — || — || — || — || — || — || DMA_FULL
 * — || — || — || — || — || — || — || DMA_FULL

SP_DMA_BUSY

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || R-0
 * — || — || — || — || — || — || — || DMA_BUSY
 * — || — || — || — || — || — || — || DMA_BUSY

SP_SEMAPHORE

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || RW-0
 * — || — || — || — || — || — || — || SEMAPHORE
 * — || — || — || — || — || — || — || SEMAPHORE


 * SEMAPHORE
 * The goal of this bit is to help implementing a mutex between VR4300 and RSP. The mutex can be used to guard access to any shared hardware resource, a typical example being the DMA engine. To acquire the mutex, the CPU (either VR4300 or RSP) should spin reading the  bit until it reads 0. At that point, the bit is automatically flipped to 1 by the hardware, so reading 0 means "the semaphore was free, and you have just acquired it". After the CPU is done using the shared resource, it can simply write 0 to   to release it.

RSP PC register
RSP has an internal PC (program counter) register that cannot be explicitly accessed via RSP opcodes. Instead, a memory mapped register is available to VR4300 to control the RSP PC while RSP is halted. The register is called.

Notice that VR4300 is allowed to access SP_PC only while RSP is halted. Reading from SP_PC while RSP is running returns garbage data, and writing to it causes RSP to misbehave.

SP_PC

 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0 || U-0


 * U-0 || U-0 || U-0 || U-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="4" | — || colspan="4" | PC[11:8]
 * colspan="4" | — || colspan="4" | PC[11:8]


 * RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0
 * colspan="8" | PC[7:0]
 * colspan="8" | PC[7:0]

Extra Details:
 * PC
 * Reads while RSP is running returns random bits. Reads while RSP is halted return the address of the instruction that the RSP will execute when it is unhalted. Writes will also reset the RSP CPU core pipeline, so any pending writeback or branch are discarded.