SysAD Interface: Difference between revisions

linkify RCP
(Spelling mistake and some removal of some extra text)
(linkify RCP)
 
(9 intermediate revisions by 2 users not shown)
Line 1:
The MIPS interface is a bidirectional interface on the N64 that allows 32bits of address and data transfers with a 5bit control bus and 3 controlling signals (There are another 3 controls that are used for multi-cpu setups but are not used in the N64). The bus almost works like a packet system where the control bus select what is being read or written and how much data as well from the CPU or [[RCP]].
 
'''Masterclock''': This is a clock feed from the RCP to the CPU. All signals are done on the Rising clock edge of the master clock. The internal CPU speed is multiplied using this clock to 93.75mhz. But all access between the RCP and CPU is at the masterclock rate (62.5mhz)
Line 18:
{| class="wikitable"
|SYS Cmd Type
|Bit 4 - Command or Data
|Bit 3
|Bit 2
|Bit 1 - Size
|Bit 0 - Size
|-
|Command – Data On bus
Line 123:
|}
 
= Instruction non-cached reads,and Data Word: non-cached read:reads =
Both instruction reads and data word non cached reads run the same way.
[[File:CPU Read 32bit.png|thumb|711x711px|32 bit read from the CPU]]
Line 145:
5.      On the next master clock cycle the RCP puts the Evalid back high and puts both buses back in high-Z. Then the CPU does its next command
 
= Data: Word non-cached write: =
All 8, 16, 24 and 32 bit writes are done as a single ’32 bit’ write structure but the last 2 bits of the SYSCMD bus say what type of write is to happen.
[[File:CPU Write 32bit.png|thumb|657x657px|32 bit write from the CPU]]
Line 170:
For 40/48/56 writes this is processed as two separate 32 write commands where the LSB is written first then the MSB is written next.
 
= Data: Double word non-cached write 64 bits: =
1.      First the CPU checks that the EoK is Low saying that the RCP is ready to accept data.
 
Line 193:
5.      On the next master clock cycle the CPU will place the Pvalid high to complete the write and RCP holds the EoK High to say that it is processing the write and will stay high until the write is completed internally. During this time the CPU can place then next command address on the busses and will stay in a hold state until the Eok Goes back low, Then the next master clock (So the EoK will be low for a full cycle) the RCP will then process that command.
 
= Data: cachedCached write 128 bits: =
The address for cache writes will always be 128 bit aligned for the address (So that last 4 bits will be 0000) This is because the D-cache ram in the CPU is 128 bits and when a dirty write happens the full 128 entry is written back to ram (cache dump opcodes run the same way too but only write back entrys that are marked as dirty)
[[File:Cpu Write dcache 128bit.png|thumb|652x652px|128bit D-cache write from the CPU]]
Line 233:
 
 
= Data: cachedCached read 128 bits: =
The address for cache reads will always be 64 bit aligned for the address (So that last 3 bits will be 000) Now why 64 bit address aligned? Well to keep speed up in the 4300i CPU the D-cache is 128 bits long but the CPU can read up to 64 bits for data at most, so we want to put the 64bit data as quickly as possible in the 128bit word aligned entry. Two things can happen if bit 4 of the data address requested is high or low. If low, the data to the CPU is sent normally from LSB to MSB words and the CPU will work on its merry way as it has received its needed 64 bits first. This is what we call a sequential order transfer.
 
Line 281:
8.      On the next master clock cycle the RCP will place the Evalid high to complete the transfer.
 
= Instruction: cachedCached read 256 bits: =
The address for I-cache reads will always be 256 bit aligned for the address (So that last 5 bits of the address will always be 00000) This is because the I-Cache memory is setup as a 256 aligned entry and there is no smarts in the 4300i CPU to know if all of the cache entry is full. Thus, the CPU must have a full entry in the icache before it can load instructions. The 256 reads run the same as the D-cache reads (and are sequential order). But the command sent to the RCP is the read – 256 it command. And 8x 32 bit data accesses are sent over the SYSAD bus
[[File:Read icache 256bit.png|thumb|796x796px|Read from the CPU that is 256 Bits for the I-cache]]
 
Line 292:
 
 
 
= CPU Throughput tips and, tricks and ideas =
These ideas are subjective and are only from a hardware over look perspective.
 
Line 299 ⟶ 300:
* The SYSAD bus is advertised as a 250mbyte/second interface. But due to the bidirectional and wait states, I would believe max throughput would be more to the 200Mbyte/second or less mark.
* When caching memory try and keep to a 16kbyte instruction blocks (thus keeping your cache hits higher) and data in 8Kbyte blocks. Cache opcodes can also help in the fulling and dumping of memory locations. These program locations can be “pre-cached/fulled” by using the CP0 co-processer and then activating it using the cache opcode. These opcodes are very helpful if used correctly and some keep the CPU running.
* And something I would love to see done. Use the DMEM and IMEM in the RSP core like a fast ram access (Just remember these are 32 bit writes only) so no caching. But you can DMA to from ram to them and then used this as a cache ‘bootcode’ for the cache opcode process. Then run in cache memory. This ram is very fast and has about a 4-5 clock cycle wait time, where the RDRAM has about 10-20+ clock wait time for a data process to happen.