Peripheral Interface: Difference between revisions

PI_BB_GPIO: RTC info
m (→‎0x0460 00n0 - PI_BSD_DOMn_RLS: Fixed bit range typo)
(PI_BB_GPIO: RTC info)
 
(26 intermediate revisions by 3 users not shown)
Line 4:
 
 
== The PI Bus ==
The PI bus is the bus where external devices can be connected, via either the cartridge port on the top of the console, or the expansion port at the bottom of the console. Notice both ports are electrically connected to the same bus, even if the connector is different.
 
Line 84:
| RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || R-0
|-
| colspan="87" | DRAM_ADDR[7:01] || 0
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31-24 | Undefined | Initialized to <code>0</code>
| 23-01 | DRAM_ADDR[23:01] | Base address of RDRAM for PI DMAs; notice that bit- 0 cannot be written and is alwaysfixed 0to zero.
}}
'''Extra Details:'''
Line 111:
| RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || RW-0 || R-0
|-
| colspan="87" | CART_ADDR[7:01] || 0
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31-01 | CART_ADDR[31:01] | Base address of the PI bus (e.g. cartridge) for PI DMAs; notice that bit- 0 cannot be written and is alwaysfixed to 0.
}}
 
Line 169:
}}
'''Extra Details:'''
: Writing to this register will start the DMA transfer. Reading appears to almost always return `0x7F` (moresee [[Peripheral Interface#PI WR LEN readbacks after a transfer|below]] researchfor requiredexceptions).
 
==== <span style="display:none;">0x0460 0010 - PI_STATUS ====
Line 282:
'''Extra Details:'''
: During [[Initial_Program_Load#IPL2|IPL2]], the N64 will initialize Domain 1's PGS using data read from the cartridge [[ROM_Header|ROM header]]. All official ROMs set PGS = 7 (meaning 2^(7+2) = 512 bytes).
: The smallest possible value, 0, means 2^(0+2) = 4 bytes; the largest means 2^(15+2) = 128KiB.
 
Page Size only matters for DMA transfers; all direct accesses via the PI are only ever 32 bits wide.
Line 314 ⟶ 315:
'''Extra Details:'''
: During [[Initial_Program_Load#IPL2|IPL2]], the N64 will initialize Domain 1's RLS using data read from the cartridge [[ROM_Header|ROM header]]. All official ROMs set RLS = 3 (meaning (3+1)*16 = 64ns).
 
 
= iQue Player-specific registers =
 
'''Table Notation:'''
<pre>
R = Readable bit
W = Writable bit
U = Undefined/Unused bit
-n = Default value n at power on
[x:y] = Specifies bits x to y, inclusively</pre>
 
==== <span style="display:none;">0x0460 0040 - PI_BB_ATB_UPPER</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_ATB_UPPER <code>0x0460 0040</code> (Read)}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || —
{{#invoke:Register table|row|23:16}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || —
{{#invoke:Register table|row|15:8}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || IV Source
{{#invoke:Register table|row|7:0}}
| U-? || U-? || U-? || U-? || colspan="4"| U-?
|-
| — || — || CpuEn || DmaEn || colspan="4"| log2(Num Blocks)
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 8 | IV Source | Where to source the Initialization Vector from for AES decryption. See below.
| 5 | CpuEn | If set to 1, the mapping will be enabled for CPU reads
| 4 | DmaEn | If set to 1, the mapping will be enabled for DMA reads
| 3-0 | log2(Num Blocks) | log2 of the number of contiguous NAND blocks to map. This is applied to an ATB entry when '''ATB_LOWER''' registers are written.
}}
 
'''Extra Details'''
: This register supplies only half of the configuration for an ATB entry, also see the '''PI_BB_ATB_LOWER''' array of registers where PI addresses and the starting NAND block number are specified.
: Mappings work with sequences of blocks, whose length is a power of two. The register here contains the logarithm of the length so for instance writing "0" causes 1 block to be mapped; writing 4 causes 16 consecutive blocks to be mapped.
: ATB is the N64 PI address space emulator that translates PI DMAs into NAND flash accesses. Data stored on the NAND is encrypted with AES, ATB must transparently decrypt the data when a PI DMA requests it. To decrypt AES at an 0x10-aligned position '''P''' the data at '''P-0x10''' is also required, or if '''P=0''' then the Initialization Vector (IV) is required. At the start of a DMA, ATB will try to find the entry that maps the PI address for '''P-0x10''' into the NAND to fetch the needed prior data; for all cases but '''P=0''' this should resolve correctly with a contiguous PI address space mapping. To handle the '''P=0''' case an additional dummy mapping must precede the base address of the desired mapping, with the IV Source bit set to 1. When the IV Source bit is 1 the IV will be pulled from the memory at '''0x046104D0''' rather than reading any data off the NAND. For example if the mapping begins at PI address 0x10000000 as for Cartridge ROM, a dummy mapping for PI address 0x0FFFC000 with IV Source set to 1 should be programmed.
 
==== <span style="display:none;">0x0460 0048 - PI_BB_NAND_CTRL</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_NAND_CTRL <code>0x0460 0048</code> (Read)}}
{{#invoke:Register table|row|31:24}}
| R-0 || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| Busy || — || — || — || — || — || — || —
{{#invoke:Register table|row|23:16}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || —
{{#invoke:Register table|row|15:8}}
| U-? || U-? || U-? || U-? || R-0 || R-0 || U-? || U-?
|-
| — || — || — || — || Single-bit Error || Double-bit Error || — || —
{{#invoke:Register table|row|7:0}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || —
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31 | Busy | Indicates that a command is currently executing.
| 11 | Single-bit Error | Indicates that a single-bit error was detected by ECC. These are automatically corrected so generally no action is required.
| 10 | Double-bit Error | Indicates that a double-bit error was detected by ECC. Unlike single-bit errors, these are not automatically recoverable.
}}
 
{{#invoke:Register table|head|800px|PI_BB_NAND_CTRL <code>0x0460 0048</code> (Write)}}
{{#invoke:Register table|row|31:24}}
| W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0 || W-0
|-
| Execute || Interrupt || — || — || — || — || — || —
{{#invoke:Register table|row|23:16}}
| colspan="8"| W-0
|-
| colspan="8"| NAND Command
{{#invoke:Register table|row|15:8}}
| W-0 || W-0 || colspan="2"| W-0 || W-0 || W-0 || colspan="2" | W-0
|-
| — || Buffer Select || colspan="2"| Device Select || Do ECC || Multicycle || colspan="2"| Data Length [9:8]
{{#invoke:Register table|row|7:0}}
| colspan="8"| W-0
|-
| colspan="8"| Data Length [7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31 | Execute | Setting this bit when writing will cause the last written command to begin execution.
| 30 | Interrupt | Whether the FLASH interrupt should be raised when the command finishes execution.
| 29 | ? | Unknown. Set when issuing Page Program (first cycle)
| 28 | ? | Unknown. Set when issuing Read 1, Read Status and Read ID
| 27 | ? | Unknown. Set when issuing Read 1, Block Erase (first cycle) and Page Program (first cycle)
| 26 | ? | Unknown. Set when issuing Read 1, Block Erase (first cycle) and Page Program (first cycle)
| 25 | ? | Unknown. Set when issuing Read 1, Block Erase (first cycle) and Page Program (first cycle)
| 24 | ? | Unknown. Set when issuing Read 1, Read ID and Page Program (first cycle)
| 23-16 | NAND Command | NAND Command to execute. Corresponds directly to commands for the K9F1208U0M flash.
| 15 | ? | Unknown. Set when issuing Read 1, Block Erase (second cycle) and Page Program (second cycle)
| 14 | Buffer Select | Selects which half of the 0x400-byte PI Buffer mapped at 0x04610000 should be used for DMA operations. See '''iQue Player-specific Memory''' for details on this buffer
| 13-12 | Device Select | Corresponds to Chip Enable signals on the card connector. Typically 0.
| 11 | Do ECC | Whether to do ECC
| 10 | Multicycle | Set to 1 if the command issued was not the last command in a multi-cycle sequence.
| 9-0 | Data Length | Data transfer length in bytes. Unlike most other lengths this is not length minus one, a length of 0 can be specified.
}}
 
'''Extra Details:'''
: Writing 0 to this register will clear any pending FLASH interrupt.
 
==== <span style="display:none;">0x0460 004C - PI_BB_NAND_CFG</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_NAND_CFG <code>0x0460 004C</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="8"| U-?
|-
| colspan="8"| Configuration
{{#invoke:Register table|row|23:16}}
| colspan="8"| U-?
|-
| colspan="8"| Configuration
{{#invoke:Register table|row|15:8}}
| colspan="8"| U-?
|-
| colspan="8"| Configuration
{{#invoke:Register table|row|7:0}}
| colspan="8"| U-?
|-
| colspan="8"| Configuration
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31-0 | Configuration | Likely specifies timing configurations for different NAND flash chips. It is currently unknown how to relate values programmed into this register and timing information found in datasheets.
}}
 
'''Extra Details'''
 
System software programs <code>0x753E3EFF</code> into this register to execute a Read ID command, then selects an appropriate configuration based on the ID:
{| class="wikitable"
! ID [31:16] !! NAND Size in Blocks !! NAND Size in MiB !! Configuration Value !! Part Number
|-
| 0xEC76 || 0x1000 || 64 || 0x441F1F3F || K9F1208U0M
|-
| 0xEC79 || 0x2000 || 128 || 0x441F1F3F || K9K1G08U0A or K9K1G08U0B
|-
| 0x9876 || 0x1000 || 64 || 0x753E1F3F || TC58512FT
|-
| 0x2076 || 0x1000 || 64 || 0x441F1F3F || NAND512W3A
|}
 
==== <span style="display:none;">0x0460 0058 - PI_BB_RD_LEN</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_RD_LEN <code>0x0460 0058</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="8"| U-?
|-
| colspan="8"| —
{{#invoke:Register table|row|23:16}}
| colspan="8"| U-?
|-
| colspan="8"| —
{{#invoke:Register table|row|15:8}}
| colspan="7"| U-? || W-?
|-
| colspan="7"| — || Length [8]
{{#invoke:Register table|row|7:0}}
| colspan="8"| W-?
|-
| colspan="8"| Length [7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| ?-0 | Length | DMA Transfer Length (-1). Writes initiate a DMA from SDRAM starting at '''PI_DRAM_ADDR''' to the PI Buffer at '''0x04610000 + PI_CART_ADDR'''. Exact bit width unknown, it is at least long enough to transfer 0x200 bytes.
}}
 
'''Extra Details'''
: It is currently unknown what the behavior is if a DMA extends out of the bounds of the target PI Buffer, and whether both buffers can be accessed in one transfer.
: The busy bits in '''PI_STATUS''' also applies to these transfers.
: These transfers also trigger an interrupt upon completion. It is the same interrupt used for regular PI DMAs.
 
==== <span style="display:none;">0x0460 005C - PI_BB_WR_LEN</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_WR_LEN <code>0x0460 005C</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="8"| U-?
|-
| colspan="8"| —
{{#invoke:Register table|row|23:16}}
| colspan="8"| U-?
|-
| colspan="8"| —
{{#invoke:Register table|row|15:8}}
| colspan="7"| U-? || W-?
|-
| colspan="7"| — || Length [8]
{{#invoke:Register table|row|7:0}}
| colspan="8"| W-?
|-
| colspan="8"| Length [7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| ?-0 | Length | DMA Transfer Length (-1). Writes initiate a DMA from the PI Buffer at '''0x04610000 + PI_CART_ADDR''' to SDRAM starting at '''PI_DRAM_ADDR'''. Exact bit width unknown, it is at least long enough to transfer 0x200 bytes.
}}
 
'''Extra Details'''
: It is currently unknown what the behavior is if a DMA extends out of the bounds of the target PI Buffer, and whether both buffers can be accessed in one transfer.
: The busy bits in '''PI_STATUS''' also applies to these transfers.
: These transfers also trigger an interrupt upon completion. It is the same interrupt used for regular PI DMAs.
 
==== <span style="display:none;">0x0460 0060 - PI_BB_GPIO</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_GPIO <code>0x0460 0060</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="2"| R-? || U-? || U-? || U-? || colspan="2"| R-? || R-?
|-
| colspan="2"| Box ID [15:14] || — || — || — || colspan="2"| Box ID [10:9] || Box ID [8:6] [2]
{{#invoke:Register table|row|23:16}}
| colspan="2"| R-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| colspan="2"| Box ID [8:6] [1:0] || — || — || — || — || — || —
{{#invoke:Register table|row|15:8}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
|-
| — || — || — || — || — || — || — || —
{{#invoke:Register table|row|7:0}}
| colspan="2"| W-? || W-? || W-? || colspan="2"| W-? || W-? || W-?
|-
| colspan="2"| RTC Mask || LED Mask || Power Mask || colspan="2"| RTC Control || LED Control || Power Control
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31-16 | Box ID [15:0] | System software calls this area the "Box ID". Various sub-fields are read out of this area separately for varying purposes.
| 31-30 | Box ID [15:14] | System software reads this as some sort of model identifier. Precise meaning unknown. Whether all players have the same value here is unknown.
| 26-25 | Box ID [10:9] | System clock speed identifier? System software reads this to determine delay intervals for some operations.
| 24-22 | Box ID [8:6] | The Boot ROM checks this against bits [10:8] in a register at MI+0x10, if they don't match this value is copied there and the system is rebooted?
| 7-6 | RTC Mask | Enables the RTC bit lines. If left off, changes to RTC Control will do nothing.
| 5 | LED Mask | Enables the LED bit line. If left off, changes to LED Control will do nothing.
| 4 | Power Mask | Enables the power control bit line. If left off, changes to Power Control will do nothing.
| 3-2 | RTC Control | RTC communication happens through these bits. The communication protocol is described in the [https://www.st.com/content/ccc/resource/technical/document/datasheet/19/24/95/e2/85/6a/47/30/CD00003139.pdf/files/CD00003139.pdf/jcr:content/translations/en.CD00003139.pdf ST M41T0 Serial RTC datasheet]; the lower bit is the clock line while the upper bit is the data line.
| 1 | LED Control | If 0, the LED on the front of the player will light up. If 1, the LED will switch off.
| 0 | Power Control | If 1, the power will remain on. If 0, the device will power off.
}}
 
'''Extra Details:'''
: Whenever a GPIO control bit (with its corresponding mask bit set) is set to 1, the corresponding bit line will be set to logic high (3.3v). If set to 0 (with mask set) the bit line is set to logic low (0v).
: The LED lights up when the LED GPIO is 0 as the LED requires a voltage difference across it to light up. One side of the LED is fixed to 3.3v while the other side is connected to the LED GPIO port; when LED Control is 1 there is no voltage difference across the LED (3.3 - 3.3 = 0v) so it does not light up, while an LED Control of 0 creates a voltage difference (3.3 - 0 = 3.3v) so the LED lights up.
 
==== <span style="display:none;">0x0460 0070 - PI_BB_NAND_ADDR</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_NAND_ADDR <code>0x0460 0070</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="5"| U-? || colspan="3"| W-?
|-
| colspan="5"| — || colspan="3"| Address [26:24]
{{#invoke:Register table|row|23:16}}
| colspan="8"| W-?
|-
| colspan="8"| Address [23:16]
{{#invoke:Register table|row|15:8}}
| colspan="8"| W-?
|-
| colspan="8"| Address [15:8]
{{#invoke:Register table|row|7:0}}
| colspan="8"| W-?
|-
| colspan="8"| Address [7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| ?-0 | Address | Set the NAND flash address that commands issued by '''PI_BB_NAND_CTRL''' will target. Exact bit width is unknown, however it is at least enough to address 128MiB (27 bits)
}}
 
'''Extra Details:'''
: To convert a page number to an address, multiply it by 512.
: To convert a block number to an address, multiply it by 0x4000.
 
==== <span style="display:none;">0x0461 0500 to 0x0461 0800 - PI_BB_ATB_LOWER</code> ====
----
{{#invoke:Register table|head|800px|PI_BB_ATB_LOWER <code>0x0461 0500 - 0x0461 0800</code>}}
{{#invoke:Register table|row|31:24}}
| colspan="8"| U-?
|-
| colspan="8"| NAND Block Number [15:8]
{{#invoke:Register table|row|23:16}}
| colspan="8"| U-?
|-
| colspan="8"| NAND Block Number [7:0]
{{#invoke:Register table|row|15:8}}
| colspan="8"| U-?
|-
| colspan="8"| PI Physical Address [29:14] [15:8]
{{#invoke:Register table|row|7:0}}
| colspan="8"| U-?
|-
| colspan="8"| PI Physical Address [29:14] [7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
| 31-16 | NAND Block Number | Starting block number to map to the provided PI address.
| 15-0 | PI Physical Address [29:14] | PI address to begin the mapping at, divided by 0x4000 the NAND block size.
}}
 
'''Extra Details'''
: There are 192 '''ATB_LOWER''' registers. Issuing a write to a particular register will program that ATB entry with a mapping, also using the current contents of '''ATB_UPPER''' to complete the entry configuration.
: The number of blocks to map comes from '''ATB_UPPER'''.
: Mappings involving non-contiguous or unsorted NAND blocks must occupy multiple ATB entries.
: These ATB entries should be sorted by PI address, from lowest to highest.
: It is not possible to write addresses that are not aligned to the NAND block size (0x4000)
: It is not possible to map more contiguous blocks than the PI address alignment allows in a single entry. For example it is not possible to map 2 contiguous blocks in the same ATB entry if the base address is 0x10004000. The maximum number of blocks you can map for given <code>(pi_addr, nblocks)</code> in a single ATB entry is <code>1 << min(ctz(pi_addr/0x4000), ceil(log2(nblocks)))</code> where <code>ctz(x)</code> counts the number of trailing zeros in the binary representation of <code>x</code>.
 
= iQue Player-specific memory =
 
In addition to extra registers, the iQue Player maps additional memory into the PI registers address space for use in various PI operations.
{| class="wikitable"
! colspan=2 | Address Range !! Name !! Description
|-
| 0x04610000 || 0x046101FF || PI Buffer 0 || Holds intermediate data between SDRAM and the NAND. NAND commands transfer data between this buffer and the flash; transfers between this buffer and SDRAM is done via DMAs triggered by '''PI_BB_RD_LEN''' and '''PI_BB_WR_LEN'''. AES decryptions happen in this buffer.
|-
| 0x04610200 || 0x046103FF || PI Buffer 1 || Same as Buffer 0 in operation.
|-
| 0x04610400 || 0x0461040F || PI Spare Data 0 || Holds "spare data" for buffer 0 contents.
|-
| 0x04610410 || 0x0461041F || PI Spare Data 1 || Holds "spare data" for buffer 1 contents.
|-
| 0x04610420 || 0x046104CF || AES Expanded Key || Holds the AES expanded key for AES decryption operations.
|-
| 0x046104D0 || 0x046104DF || AES Initialization Vector || Holds the AES IV for AES decryption operations.
|}
 
 
= Physical Bus Pinout =
Line 510 ⟶ 834:
|}
 
=== PI Interface Process ===
 
====== Address output: ======
[[File:Rom address output.png|border|left|frameless|984x984px|Rom Address Output]]
 
=== Data Read ===
[[File:Rom Read Data.png|alt=Rom Read Data|Rom Read Data]]
 
=== Constant Read ===
[[File:Constant ROM Access.png|Constant ROM Access]]
 
 
====== Data Read: ======
[[File:Rom Read Data.png|alt=Rom Read Data|border|left|frameless|1003x1003px|Rom Read Data]]
 
= DMA Transfers =
PI DMA is well defined for so-called "aligned transfers", which are defined by the following constraints:
 
# RDRAM address must be 8 bytes aligned
# PI address must be 2 bytes aligned
# Length must be a multiple of 2
 
Notice that the second point might be considered redundant from a hardware point of view given that both registers holding addresses are fixed to be 2-byte aligned (LSB is fixed to 0), but from a software point of view, this has to be taken into account.
 
The behavior of PI DMA when the first and third constraint are not respected is not well designed; it seems like the designers attempted to implement support for loosing these constraints but gave up in the middle, leaving the hardware in a state that can only be described as "buggy". This also leaks some internal details on how the transfers are performed.
====== Constant Read: ======
[[File:Constant ROM Access.png|border|left|frameless|1522x1522px|Constant ROM Access]]
 
To implement PI DMA, the RCP uses an internal 128 byte buffer. The following section attempts to describe the exact process (though the *actual* process implemented in the hardware is unknown; the following does match in observable behavior).
 
NOTE: only DMA write transfers (PI -> RDRAM) have been analyzed in detail, using default PI DOM1 settings. It is expected that read transfers (RDRAM -> PI) behave in a specular way, though it's not been fully tested yet. We also expect PI DOM1 page size setting to somehow affect the transfer, though this has also not been explored yet.
 
==== Internal process ====
The transfer is split in blocks of maximum 128 bytes each one. Within each block, the PI first fills the internal buffer fetching data from the PI bus, and then write backs the buffer contents to RDRAM. This can be observed by monitoring PI_DRAM_ADDR and PI_CART_ADDR: during the transfer, it can be first seen PI_CART_ADDR moving forward, and then PI_DRAM_ADDR catching up with a leap (writing to RDRAM is much faster than reading PI).
 
In general for all blocks of the transfer (excluding the first one, see below), the logic appears to be as follows:
 
* Compute the block size. This is the smallest between the remaining length, the end of the current RDRAM page, and 128 bytes (which is the maximum size of the internal buffer). RDRAM pages are 2 KiB (0x800) long, so for instance if the current RDRAM address (at the beginning of the block) is 0x147e0, the block size will be 0x20 because the RDRAM page ends at 0x147ff.
* Fill the page using PI reads from the bus. All PI accesses are always 16-bit long, so if the block size was odd (which happens on the last block, if the remaining length is odd), one extra byte will be fetched from PI into the internal buffer.
* Write back into RDRAM. The exact format of RDRAM writes is unknown at the moment; since PI DMA transfers are well-defined for 8-byte aligned RDRAM addresses, it is assumed that 64-bit writes are used (a burst like that used for D/I cache writebacks would require 16-byte alignment or more to be performed). If an extra byte was fetched in the previous step, that byte is also written to RDRAM. So in general odd-length PI DMA transfers will transfer one byte more than requested.
 
The above logic applies for all blocks of the transfer, excluding the first one. The first block in fact is treated specially by PI. It appears that the goal of the designers was to use the first block to realign transfers to 8-byte in RDRAM, which possibly causes the first block to use smaller, masked writes to RDRAM. So, even if the RDRAM starting address is misaligned, all blocks besides the first one will begin from a 8-byte aligned RDRAM address, and behave with the logic described above.
=Aligned DMA Transfer=
An aligned DMA transfer is when the PI_DRAM_ADDR_REG is set to a 64bit (8byte) aligned address. The PI_CART_ADDR_REG can be any 16bit (2Byte) value as will transfer from that offset to RDRAM.
 
==== Internal process: first block ====
The PI_RD_LEN_REG and PI_WR_LEN_REG can be any length, as long as it is a 2 byte aligned amount (more testing is to be done on this to confirm this)
These are the differences in logic while processing the first block, which mostly concerns how to handle the initial RDRAM misalignment. In this description, we refer to ''RDRAM misalignment'' as the amount of bytes that the RDRAM address is distant from the previous 8-byte aligned word (that is, the misalignment is the value of the last 3 bits of the RDRAM address). Notice that the RDRAM address hardware register has the LSB fixed 0, so misalignment can be either 2, 4, or 6.
 
* The internal 128 byte buffer is filled starting from the index matching the misalignment. This might affect the maximum size of the first block: for instance, if misalignment is 6, the maximum size is not 128 but 122, because the first 6 bytes are skipped.
= Unaligned DMA transfer =
* Writes to RDRAM seems to use some kind of masking, so they are correctly done at the byte granularity. This means that odd length transfers in the first block appear to work correctly. Notice that this applies only to the first block whatever its size is; the size (as described above) might be limited by the end of the RDRAM page, in which case only odd transfers up to there are working correctly.
An un-aligned ROM dma transfer is when you use the PI_DRAM_ADDR_REG and not set it as a 8 Byte aligned address and use variable PI_RD_LEN_REG and PI_WR_LEN_REG lengths.
* As an exception to the above exception, if the first block reaches the end of the 128 byte buffer, the last 2 bytes of the buffer are always written back in full to RDRAM, even though one less byte was requested.
** Example: PI DMA transfer with misalignment 0 and RDRAM page end far away. Odd lengths up to 125 (included) work correctly; odd transfers of exactly 127 bytes are rounded up to 128 (since they reach the last 16-bit word of the buffer). Also odd transfers of 129 or more, since they need two blocks to be performed, fall back into the general rule where one more byte is transferred.
** Example: PI DMA transfer with misalignment 6 and RDRAM page end far away. Odd lengths up to 119 (included) work correctly; odd transfers of exactly 121 bytes are rounded up to 122 (since they reach the last 16-bit word of the buffer). Also odd transfers of 123 or more, since they need two blocks to be performed, fall back into the general rule where one more byte is transferred.
* There seems to be a hardware bug related how RDRAM writes are performed, in case of misaligned addresses. It seems like the hardware is counting the block length starting from index 0 of the buffer, even though the first byte was actually placed at the index matching the misalignment, and even though masking is performed correctly. This means that for instance, if misalignment is 6 and the length of 8, the following happens:
** First, 8 bytes are fetched from the PI bus and put at index 6..13 in the internal buffer.
** Then, RDRAM writes are performed but the hardware believes the block ends at index 8, so only bytes 6..8 are written back to RDRAM.
* Symmetrically, if the buffer is full (128 bytes), the last 6 bytes will not be transferred because of the same bug (even if those bytes were fetched by the PI bus). So there will be a "hole" of 6 bytes in the RDRAM output buffer. For instance, if misalignment is 6 and the length is 1024, and the RDRAM page end is far away, the following happens on the first block:
** Block size is computed as 122 bytes.
** 122 bytes are fetched from the PI bus, and put at index 6..127 in the internal buffer.
** RDRAM writes are performed but the hardware believes that the block ends at index 121, so only bytes 6..121 are written back to RDRAM.
** Notice that, this notwithstanding, RDRAM address is correctly rounded up to 8 byte at the end of the block (see below), so the second block will behave correctly. There will be a hole in RDRAM as bytes 122.127 in the first block are never written back to RDRAM, so the content of RDRAM for those bytes is not affected by DMA.
* RDRAM address register is always rounded up to the next 8 byte alignment at the end of the first block. In most normal cases, the logic above already ensures that the address ends up being aligned at the end of the block, but the rounding up happens even in cases like short transfers that ends with the first block at ends at an arbitrary byte.
 
==== Followup transfers ====
The following rules are based on assumptions via the created test ROMs by Krom, Mazamars312 and Lemmy ([https://github.com/PeterLemon/N64/tree/master/CPUTest/DMAAlignment-PI-cart https://github.com/PeterLemon/N64/tree/master/CPUTest/DMAAlignment-PI-cart)]
After a DMA transfer is finished, it is possible to trigger a "follwup transfer", that is a transfer that sequentially continues the previous one, by simply writing a new length to the PI_WR_LEN register. In this case, the current values of PI_DRAM_ADDR and PI_CART_ADDR are used at the beginning of the transfers. Those values will match the last addresses as updated by the first transfer.
{| class="wikitable"
|+This are example DMA transfers
!RDRAM Address
!ROM Address
!Read or Write
!Length
!What happens
|-
|0000_0100
|1000_1000
|Read
|0x7F (128 Bytes)
|This is a normal aligned transfer
|-
|0000_0102
|1000_1000
|Read
|0x7F (128 Bytes)
|The start of the ROM data is transferred to RDRAM offset as expected (So the first two bytes of RDRAM are not affected by this write).
However, this is where we see that the last 2 bytes are dropped from the transfer. Thus only making it a 0x7D length transfer (126 bytes - 1)
|-
|0000_0106
|1000_1000
|Read
|0x7F (128 Bytes)
|The start of the ROM data is transferred to RDRAM offset as expected (So the first 6 bytes of RDRAM are not affected by this write).
However, this is where we see that the last 6 bytes are dropped from the transfer. Thus only making it a 0x79 length transfer (122 bytes - 1)
|-
|0000_0106
|1000_1000
|Read
|0x17 (24 Bytes)
|The start of the ROM data is transferred to RDRAM offset as expected (So the first 6 bytes of RDRAM are not affected by this write).
However, this is where we see that the last 6 bytes are dropped from the transfer. Thus only making it a 0x11 length transfer (18 bytes - 1)
|-
|0000_0106
|1000_1000
|Read
|0xFF (256 Bytes)
|This is where we have found that internally the N64 can only DMA blocks of 128 at a time to and from RDRAM as a burst to the PI controller.
The First 128 Bytes:
 
The above section describes in details how PI reads and RDRAM writes are done, and registers are updated, so they also implicitly describe how a followup transfer behaves in various edge cases (short transfers, misaligned transfers, etc.)
The start of the ROM data is transferred to RDRAM offset as expected (So the first two bytes of RDRAM are not affected by this write).
 
==== PI_WR_LEN readbacks after a transfer ====
However, this is where we see that the last 6 bytes are dropped from the transfer. Thus only making it a 0x79 length transfer (122 bytes - 1)
Reading back PI_WR_LEN after a transfer is done, appears to always be fixed at 0x7F. The only exception that has been noticed is when the transfer was smaller than 8 bytes: in that case, the value is 0x7F minus the initial RDRAM misalignment. For instance, if the RDRAM misalignment was 4, the value found in the register at the end of the transfer will be 0x7B.
 
==== DMA data dumps ====
The Second 128 Bytes:
To further investigate and understand how PI DMA is performed, the repo [https://github.com/rasky/n64_pi_dma_test n64_pi_dma_test] can be used. The repo contains data dumps acquires on real hardware of DMA transfers with all possible misalignments (0, 2, 4, 6), all lengths from 1 to 384 bytes, and all distances from RDRAM page end from 0 to 128 bytes. It also contains timing information on all those transfers. The repo can be used as a testsuite for emulators, but also to further investigate other side cases.
 
This will do a normal Aligned DMA transfer from the RDRAM offset 128 to 255. From this we believe the first DMA transfer is corrupted due to some internal issue with the PI controller and the RDRAM controller.
 
The image blow shows this example (look at address 112 -> 127) this shows the last 6 Bytes are not transferred. (Confirmed by Krom)[[File:DMA UnAligned 6byte Offset.png|center|thumb|300x300px]]
|-
|0000_0106
|1000_1000
|Write
|0x7F (128 Bytes)
|*** Writes to Flash and SRAM to be tested ***
|}
56

edits