SysAD Interface
The MIPS interface is a bidirectional interface on the N64 that allows 32bits of address and data transfers with a 5bit control bus and 3 controlling signals (There are another 3 controls that are used for multi-cpu setups but are not used in the N64). The bus almost works like a packet system where the control bus select what is being read or written and how much data as well from the CPU or RCP.
Masterclock: This is a clock feed from the RCP to the CPU. All signals are done on the Rising clock edge of the master clock. The internal CPU speed is multiplied using this clock to 93.75mhz. But all access between the RCP and CPU is at the masterclock rate (62.5mhz)
EoK : This is a inverted ready signal from the RCP. When LOW, this signals that the RCP is ready to accept a command from the CPU. When the EoK is high the CPU will be placed in a wait state before the next set of data is sent from the CPU. This is mostly used when doing writes from the CPU so the RCP can be ready to send the next command.
Evalid: This is the inverted signal from the RCP to say that there is valid data and commands on the SYSAD and SYSCMD buses.
Pvalid: This is an inverted signal that says that the CPU is sending a valid SYSAD and SYSCMD signal on the buses. This can be held low (valid data) if it is waiting to send data to the RCP if the EoK is still high.
SYSAD [31:0]: This is a Bidirectional Address and Data bus that is 32 bits. This can send single or burst data between the CPU and RCP when requested (Bursting is used for cache memory or double word read/writes accesses)
SYSCMD[4:0]: This is a Bidirectional bus that tells the CPU or RCP is it a read/write, how much data to be sent or the end of data.
SYSCMD Cheat sheet
SYS Cmd Type | Bit 4 - Command or Data | Bit 3 | Bit 2 | Bit 1 - Size | Bit 0 - Size |
Command – Data On bus | 1 - Command | 0 - data flag | 0 | 0 | 0 |
Command – End Data | 1 - Command | 1 - Last data flag | 0 | 0 | 0 |
Command – Response data | 1 - Command | 0 - data flag | 1 – Response data | 0 | 0 |
Read – 32 bits | 0 – Data req | 0 - read | 0- Single read | 1 | 1 – 32 bits |
Read – 64 Bits | 0 – Data req | 0 - read | 1 – Block read | 0 | 0 – 64 bits |
Read – 128 Bits | 0 – Data req | 0 - read | 1 – Block read | 0 | 1 – 128 bits |
Read – 256 Bits | 0 – Data req | 0 - read | 1 – Block read | 1 | 0 – 256 bits |
Write – 8 bits | 0 – Data req | 1- write | 0- Single write | 0 | 0 – 8 bits |
Write – 16 bits | 0 – Data req | 1- write | 0- Single write | 0 | 1 – 16bits |
Write – 24 bits | 0 – Data req | 1- write | 0- Single write | 1 | 0 – 24 bits |
Write – 32 bits | 0 – Data req | 1- write | 0- Single write | 1 | 1 – 32 bits |
Write – 64 bits | 0 – Data req | 1- write | 1- block write | 0 | 0 – 64 bits |
Write – 128 bits | 0 – Data req | 1- write | 1- block write | 0 | 1 – 128 bits |
Write – 256 bits – this is only used in testing of icache | 0 – Data req | 1- write | 1- block write | 1 | 0 – 256 bits |
Instruction and Data: non-cached reads
Both instruction reads and data word non cached reads run the same way.
1. First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
2. Then it does the following 3 things on the next master clock cycle:
- The SYSAD puts the address the CPU is requesting placed on it.
- SYSCMD is set to Read – 32 bits (All reads that are 8/16/24 and 32 are done as a 32-bit reads and the CPU does the shifting internally)
- Pvalid goes low to state that there is valid data on the two buses.
3. On the next clock cycle the Pvalid goes high and the CPU keeps the SYSAD/CMD on the buses waiting for the Evalid to do low
4. Once the Evalid goes low, the CPU goes into High-Z mode and listens to the two buses. At that same time the RCP does the following:
- The SYSAD bus from the RCP outputs the data that the CPU has requested
- The SYSCMD outputs the End Data command
5. On the next master clock cycle the RCP puts the Evalid back high and puts both buses back in high-Z. Then the CPU does its next command
Data: Word non-cached write
All 8, 16, 24 and 32 bit writes are done as a single ’32 bit’ write structure but the last 2 bits of the SYSCMD bus say what type of write is to happen.
1. First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
2. Then it does the following 3 things on the next master clock cycle:
- The SYSAD puts the address the CPU is requesting placed on it.
- SYSCMD is set to Write – 8/16/24/32 bits
- Pvalid goes low to state that there is valid data on the two buses.
3. On the next clock cycle the CPU will output the data to write and EoK from the RCP goes high to say it is accepting the data. Then the following happens:
- The SYSAD puts the data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
4. On the next master clock cycle the CPU will place the Pvalid high to complete the write and RCP holds the EoK High to say that it is processing the write and will stay high until the write is completed internally. During this time the CPU can place then next command address on the busses and will stay in a hold state until the Eok Goes back low, Then the next master clock (So the EoK will be low for a full cycle) the RCP will then process that command.
Now for 8/16/and 24 bits the data out will be address aligned on the data bus and repeated. This is so no processing is needed in the RCP on the alignment of the data.
For 40/48/56 writes this is processed as two separate 32 write commands where the LSB is written first then the MSB is written next.
Data: Double word non-cached write 64 bits
1. First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
2. Then it does the following 3 things on the next master clock cycle:
- The SYSAD puts the address the CPU is requesting placed on it.
- SYSCMD is set to Write – 64 bits
- Pvalid goes low to state that there is valid data on the two buses.
3. On the next clock cycle the CPU will output the data to write and EoK from the RCP goes high to say it is accepting the data. Then the following happens:
- The SYSAD puts the LSB data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
4. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the MSB data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
5. On the next master clock cycle the CPU will place the Pvalid high to complete the write and RCP holds the EoK High to say that it is processing the write and will stay high until the write is completed internally. During this time the CPU can place then next command address on the busses and will stay in a hold state until the Eok Goes back low, Then the next master clock (So the EoK will be low for a full cycle) the RCP will then process that command.
Data: Cached write 128 bits
The address for cache writes will always be 128 bit aligned for the address (So that last 4 bits will be 0000) This is because the D-cache ram in the CPU is 128 bits and when a dirty write happens the full 128 entry is written back to ram (cache dump opcodes run the same way too but only write back entrys that are marked as dirty)
1. First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
2. Then it does the following 3 things on the next master clock cycle:
- The SYSAD puts the address the CPU is requesting placed on it.
- SYSCMD is set to Write – 64 bits
- Pvalid goes low to state that there is valid data on the two buses.
3. On the next clock cycle the CPU will output the data to write and EoK from the RCP goes high to say it is accepting the data. Then the following happens:
- The SYSAD puts the first 32 bit data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
4. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the second 32 bit data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
5. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the third 32 bit data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
6. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the fourth 32 bit data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
7. On the next master clock cycle the CPU will place the Pvalid high to complete the write and RCP holds the EoK High to say that it is processing the write and will stay high until the write is completed internally. During this time the CPU can place then next command address on the busses and will stay in a hold state until the Eok Goes back low, Then the next master clock (So the EoK will be low for a full cycle) the RCP will then process that command.
Data: Cached read 128 bits
The address for cache reads will always be 64 bit aligned for the address (So that last 3 bits will be 000) Now why 64 bit address aligned? Well to keep speed up in the 4300i CPU the D-cache is 128 bits long but the CPU can read up to 64 bits for data at most, so we want to put the 64bit data as quickly as possible in the 128bit word aligned entry. Two things can happen if bit 4 of the data address requested is high or low. If low, the data to the CPU is sent normally from LSB to MSB words and the CPU will work on its merry way as it has received its needed 64 bits first. This is what we call a sequential order transfer.
But if the 4th bit is high, the RCP will do something that will shock you. The CPU will send the 64 word aligned address (address 8000_0018 data is sent to the RCP), First the 64 bits from that address requested are sent to the CPU. After that what data is sent next? The data from the address above it (8000_0020) or below (8000_0010)?
We have to look at how the D-cache is set up in the CPU. Each entry is 128 word aligned so with this in mind the previous 64bits of data are placed in the x8 (MSB) part of the d-cache ram so what happens to the low 64bits? This is where the RCP will then send the data from the previous address to the CPU. (This would be address 8000_0010) This is what we call a subblock ordering (please look at page 339 on the VR4300 64-bit UM PDF on this). For this steps 4 and 5 are swapped with steps 6 and 7.
1. First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
2. Then it does the following 3 things on the next master clock cycle:
- The SYSAD puts the address the CPU is requesting placed on it.
- SYSCMD is set to read – 128 bits
- Pvalid goes low to state that there is valid data on the two buses.
3. On the next cycle the CPU will put the Pvalid high and wait for the RCP to respond. During this time the SYS buses will keep the address and command on the buses
4. Once the RCP has the data, the Evalid goes low, the CPU goes into High-Z mode and listens to the two buses. At that same time the RCP does the following:
- The SYSAD has the first 32-bit data to write.
- The SYSCMD outputs the End Data command
- Evalid stays low to state that there is valid data on the two buses.
5. On the next clock cycle the RCP will output the data to write and the following happens:
- The SYSAD has the second 32-bit data to write.
- The SYSCMD outputs the End Data command
- Evalid stays low to state that there is valid data on the two buses.
6. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the third 32-bit data to write.
- The SYSCMD outputs the End Data command
- Evalid stays low to state that there is valid data on the two buses.
7. On the next clock cycle the CPU will output the data to write and the following happens:
- The SYSAD puts the fourth 32 bit data to write.
- SYSCMD is set to Command – Response data
- Pvalid stays low to state that there is valid data on the two buses.
8. On the next master clock cycle the RCP will place the Evalid high to complete the transfer.
Instruction: Cached read 256 bits
The address for I-cache reads will always be 256 bit aligned for the address (So that last 5 bits of the address will always be 00000) This is because the I-Cache memory is setup as a 256 aligned entry and there is no smarts in the 4300i CPU to know if all of the cache entry is full. Thus, the CPU must have a full entry in the icache before it can load instructions. The 256 reads run the same as the D-cache reads (and are sequential order). But the command sent to the RCP is the read – 256 it command. And 8x 32 bit data accesses are sent over the SYSAD bus
CPU Throughput tips, tricks and ideas
These ideas are subjective and are only from a hardware over look perspective.
- Cache, cache and cache all ram accesses. Ram is hard to get from the RCP and the CPU can be waiting for a long time before data is passed on from the RDRAM.
- All RCP register access should not be done via cache memory locations in CPU. As the write backs will not work correctly and will cause the RCP to crash. Also keep them at 32 bit read and writes as they only work like this and there is no waste in bandwidth over the 32bit bus.
- The SYSAD bus is advertised as a 250mbyte/second interface. But due to the bidirectional and wait states, I would believe max throughput would be more to the 200Mbyte/second or less mark.
- When caching memory try and keep to a 16kbyte instruction blocks (thus keeping your cache hits higher) and data in 8Kbyte blocks. Cache opcodes can also help in the fulling and dumping of memory locations. These program locations can be “pre-cached/fulled” by using the CP0 co-processer and then activating it using the cache opcode. These opcodes are very helpful if used correctly and some keep the CPU running.
- And something I would love to see done. Use the DMEM and IMEM in the RSP core like a fast ram access (Just remember these are 32 bit writes only) so no caching. But you can DMA to from ram to them and then used this as a cache ‘bootcode’ for the cache opcode process. Then run in cache memory. This ram is very fast and has about a 4-5 clock cycle wait time, where the RDRAM has about 10-20+ clock wait time for a data process to happen.