RDRAM
Rambus DRAM (or RDRAM) is a type of synchronous dynamic random-access memory (SDRAM) designed by Rambus. The N64 motherboard came with either one or two chips, totaling 4 MiB (4,194,304 bytes) of general purpose storage which can be accessed by the CPU. The optional Expansion Pak could increase this by an additional 4 MB and is required for some games to run.
Each byte of RDRAM actually has an extra bit, which can only be used by the RDP and VI core. This 9th bit is used to store things like anti-aliasing coverage in the color buffer. On systems other than the N64, the 9th bit would likely be used for parity checks.
RDRAM system overview
A typical RDRAM system is composed of 3 main elements :
- a controller, which act as the channel master. This role is fulfilled by the RI with the help of the RAC (Rambus ASIC Cell).
- the channel, which is a synchronous bus connecting the RDRAM devices together.
- RDRAM modules, each containing memory banks, and some registers. RDRAM devices are daisy-chained via serial signals SIn/SOut.
The N64 system implements the "Base RDRAM" protocol, which is the earliest version of RDRAM protocol. Historical note, latter version of the protocol are "Concurrent RDRAM" and "Direct RDRAM".
All known retail units use 2 MiB RDRAM modules, both within the console and within the expansion pak. There are no proof that 1 MiB RDRAM modules were ever produced and sold (preliminary datasheets can be found online, but there is no proof they ever went to production). Moreover, a bug has been found in the official Nintendo IPL3 code (in charge of RDRAM initialization) that would prevent 1 MiB modules from working. The code seems designed to handle those, but then it was probably never actually tested and in fact it would not work. This is another proof that 1 MiB modules were never used on Nintendo official parts, because if they were, the existing games would not boot on them because of this bug.
Interface Pinouts
Pin | Signal | RSL | I/O | Description |
---|---|---|---|---|
1 | VDD | - | +3.3V power supply. | |
2 | GND | - | Circuit ground. | |
3 | DQ8 | Y | I/O | Signal line (bit 8) for REQ, DIN, and DOUT packets. |
4 | GND | - | Circuit ground. | |
5 | DQ7 | Y | I/O | Signal line (bit 7) for REQ, DIN, and DOUT packets. |
6 | NC* | - | Not connected. | |
7 | ADDRESS | Y | I | Signal line for COL packets with column addresses. |
8 | VDD | - | +3.3V power supply. | |
9 | DQ6 | Y | I/O | Signal line (bit 6) for REQ, DIN, and DOUT packets. |
10 | GND | - | Circuit ground. | |
11 | DQ5 | Y | I/O | Signal line (bit 5) for REQ, DIN, and DOUT packets. |
12 | VDDA | - | Separate analog power supply for clock generation in the RDRAM. | |
13 | RXCLK | Y | I | Receive clock. All input packets are aligned to this clock. |
14 | GNDA | - | Separate analog ground for clock generation in the RDRAM. | |
15 | TXCLK | Y | I | Transmit clock. DOUT packets are aligned with this clock. |
16 | VDD | - | +3.3V power supply. | |
17 | DQ4 | Y | I/O | Signal line (bit 4) for REQ, DIN, and DOUT packets. |
18 | GND | - | Circuit ground. | |
19 | COMMAND | Y | I | Signal line for REQ, RSTRB, RTERM, WSTRB, WTERM, RESET, and CKE packets. |
20 | SIN | I | Initialization daisy chain input. CMOS levels. See section on initialization for more details. | |
21 | VREF | I | Logic threshold reference voltage for RSL signals. | |
22 | SOUT | O | Initialization daisy chain output. CMOS levels. See section on initialization for more details. | |
23 | DQ3 | Y | I/O | Signal line (bit 3) for REQ, DIN, and DOUT packets. |
24 | GND | - | Circuit ground. | |
25 | DQ2 | Y | I/O | Signal line (bit 2) for REQ, DIN, and DOUT packets. |
26 | NC | - | Not connected. | |
27 | DQ1 | Y | I/O | Signal line (bit 1) for REQ, DIN, and DOUT packets. |
28 | GND | - | Circuit ground. | |
29 | DQ0 | Y | I/O | Signal line (bit 0) for REQ, DIN, and DOUT packets. |
30 | NC | - | Not connected. | |
31 | GND | - | Circuit ground. | |
32 | VDD | - | +3.3V power supply. |
RSL stands for Rambus Signaling Levels, a low-voltage-swing, active-low signaling technology.
Source: Rambus concurrent RDRAM datasheet [1]
RDRAM registers
Number | CPU Addr[9:0] | Name | Description |
---|---|---|---|
0 | 0x000 | DeviceType | Read-only register which describes RDRAM configuration |
1 | 0x004 | DeviceId | Specifies base address of RDRAM |
2 | 0x008 | Delay | Specifies CAS timing parameters |
3 | 0x00C | Mode | Control operating mode and IOL output current |
4 | 0x010 | RefInterval | Specifies refresh interval for devices that require refresh |
5 | 0x014 | RefRow | Next row and bank to be refreshed |
6 | 0x018 | RasInterval | Specifies RAS access interval |
7 | 0x01C | MinInterval | Provides minimum delay information and some special control |
8 | 0x020 | AddressSelect | Specifies Adr field subufield swapping to maximize hit rate |
9 | 0x024 | DeviceManufacturer | Read-only register providing manufacturer and device information |
128 | 0x200 | Row | Address of currently sensed row in each bank |
RDRAM registers mirror every 0x40 bytes (16 words). Register numbers 10 to 15 all produce 0 when read. Above 0x200 (register number 128) the Device Type mirrors are replaced with the Row register.
See RI page for more details about how RDRAM registers are mapped into CPU address space.
Reset and initialization:
- After RDRAM device reset, Delay needs to be set correctly before anything will work, see Reset Complications
- After reset all devices will respond to Broadcast writes. Only the closet device in the Sin/Sout chain will respond to a non-broadcast write.
- Before the device will respond to Register reads, each device needs to be assigned a DeviceId and enabled by setting Mode's DeviceEnable bit.
- After each device has been enabled, the next device in the Sin/Sout chain will respond now respond to non-broadcast writes.
- Register reads will not return the correct value until the after the device's current control calibration as finished.
Endianness:
Rambus devices are little endian. Doesn't matter for regular memory, but that does mean registers are all byte swapped. The register descriptions below have been remapped to show big endian bit offsets.
Address alignment issues:
The Rambus register commands (Wreg, WregB and Rreg) are defined to be "Quadbyte" (32bit) transfers, and so expect the data to be in the first 4 bytes.But RI only supports transfers with 1 to 16 "Octbyte" (64 bits). This works fine for accessing registers with "even" addresses (when Addr[2] == 0
), as the value is already mapped to the first 4 bytes of the read/write data packet. But when accessing "odd" registers (whenAddr[2] == 1
), the value ends up bytes 4-7 of the Octbyte transfer, which gets ignored.
Note: Because RCP is big endian, it puts 32bit even addresses in the Upper half of the 64bit word, and 32bit odd addresses in the lower half, RI outputs the MSB first, so it ends up in byte 0 of the Rambus data packet.
MI MODE's Upper mode is used work around this problem, as it forces MI to always map the value into the upper half of DBus, which results in the value always mapping to bytes 0-3 of the "Octbyte" transfer, while still passing the original address through. Nintendo's IPL3 compulsively wraps all accesses to RDRAM register with writes to MI_MODE's SetUpper/ClearUpper. But you only actually need Upper Mode for accessing the "odd" registers.
NOTE: In the following register description we will omit the ninth bit which is unused when accessing RDRAM registers, and describe them as a 32bit word instead of 4x{8,9}bit.
DeviceType 0x00
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | R-? | R-? | R-? | R-? | U-0 | R-1 | U-0 | R-? |
ColumnBits | — | Bn | — | En | ||||
23:16 | R-? | R-? | R-? | R-? | R-? | R-? | R-? | R-? |
BankBits | RowBits | |||||||
15:8 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
— | — | — | — | — | — | — | — | |
7:0 | R-0 | R-0 | R-0 | R-? | R-0 | R-0 | R-0 | R-0 |
Version | Type |
bit 31-28 | ColumnBits: Number of column address bits, or said differently, declares that this RDRAM device has 2^ColumnBits bytes per row. |
bit 26 | Bn: Bonus, number of bits per byte. 0 = 8bit byte 1 = 9bit byte |
bit 24 | En: Enhanced speed grade. 0 = Normal 1 = Low Latency |
bit 23-20 | BankBits: Number of bank address bits, or said differently, declares that this RDRAM device has 2^BankBits banks. |
bit 19-16 | RowBits: Number of row address bits, or said differently, declares that this RDRAM devices has 2^RowBits rows per bank. |
bit 7-4 | Version: RDRAM version. 0001 = Extended architecture (Base RDRAM protocol) 0010 = Concurrent RDRAM device (not used on N64, as far as we know). |
bit 3-0 | Type: Device type. 0000 = RDRAM device |
DeviceId 0x01
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | U-0 | U-0 |
IdField[25:20] | — | — | ||||||
23:16 | RW-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
IdField[26] | — | — | — | — | — | — | — | |
15:8 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 |
IdField[34:27] | ||||||||
7:0 | RW-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
IdField[35] | — | — | — | — | — | — | — |
bit 7,15-8,23,31-26 | IdField[35:k]: Compared to AdrS[35:k] to select RDRAM. k = 21 for 16/18Mbit RDRAM. k = 20 for 8/9Mbit RDRAM. That is, bit 20 is ignored for 2 MiB RDRAM modules (which means that they can only be mapped to even device IDs and thus aligned to their size). |
Delay 0x02
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | U-0 | U-0 | RW-1 | RW-0 | RW-0 | R-0 | R-1 | R-1 |
— | — | AckWinDelay | AckWinBits | |||||
23:16 | U-0 | U-0 | RW-0 | RW-0 | RW-1 | R-0 | R-1 | R-1 |
— | — | ReadDelay | ReadBits | |||||
15:8 | U-0 | U-0 | U-0 | RW-0 | RW-0 | R-0 | R-1 | R-0 |
— | — | — | AckDelay | AckBits | ||||
7:0 | U-0 | U-0 | RW-1 | RW-0 | RW-0 | R-0 | R-1 | R-1 |
— | — | WriteDelay | WriteBits |
bit 29-27 | AckWinDelay[2:0]: Adjusts the size of the acknowledge window. Normally set to 5 = 101b. 101b = 5 tcycles 110b = 6 tcycles 111b = 7 tcycles 000b = 8 tcycles 001b = 9 tcycles 010b = 10 tcycles 011b = 11 tcycles 100b = 12 tcycles |
bit 26-24 | AckWinBits[2:0]: Read-only. Number of bits of AckWinDelay (3). |
bit 21-19 | ReadDelay[2:0]: Delay between end of request and start of Read data packet. Normally set to 7 = 111b. Defaults to 8 = 001b after reset. 111b = 7 tcycles 000b = 8 tcycles 001b = 9 tcycles 010b = 10 tcycles 011b = 11 tcycles 100b = 12 tcycles 101b = 13 tcycles 110b = 14 tcycles |
bit 18-16 | ReadBits[2:0]: Read-only. Number of bits of ReadDelay (3). |
bit 12-11 | AckDelay[1:0]: Delay between end of request and start of Ack data packet. Normally set to 3 = 11b. 11b = 3 tcycles 00b = 4 tcycles 01b = 5 tcycles 10b = 6 tcycles |
bit 10-8 | AckBits[2:0]: Read-only. Number of bits of AckDelay (2). |
bit 5-3 | WriteDelay[2:0]: Delay between end of request and start of Write data packet. Normally set to 1 = 001b. Defaults to 4 = 100b after reset. 001b = 1 tcycles 010b = 2 tcycles 011b = 3 tcycles 100b = 4 tcycles 101b = 5 tcycles 110b = 6 tcycles 111b = 7 tcycles 000b = 8 tcycles |
bit 2-0 | WriteBits[2:0]: Read-only. Number of bits of WriteDelay (3). |
Reset Complications
RI is hardwired to use a write delay of 1 TCycle, this means the Write request packet is send starting from (TCycle 0, RCP Cycle 0), finishing after 3 TCycles. 64bits of Data is send starting at (TCycle 4, RCP Cycle 1), finishing at (TCycle7, RCP Cycle 1.75)
But the WriteDelay defaults to 4 TCycles after reset (probably[1]), Attempting to write a register will result in the Rambus device sampling 32bits of data during TCycles 7 and 8 (which is RCP Cycle 1.75 and 2.0), which is too late, It's going to be zeros or some other garbage.
Before we can do anything else at all, we need to somehow set the WriteDelay to the correct value of 001b, despite the fact the WregB command results in the Rambus device reading garbage.
IPL3 gets around this problem by using MI's Repeat mode (called "Init mode" in some documentation), see MI_MODE for more details about MI's various modes.
By configuring a 16 byte repeat (MI_MODE_REG] = 0x10f
), the next word written to an Rambus device register (or memory) will be repeated over 16 bytes. This causes RI to emit a 128bit burst request, with the data on the bus during TCycle 4-11. As the Rambus device is sampling the bus during TCycle 7 and 8, it will get valid data we control.
But it's reading data halfway between two words. So IPL3 has to rotate the value by 16 bits to get the correct result. It wants to write 0x2838_1808
, but it actually writes [0xA3F80000 + RDRAM_DELAY_REG] = 0x1808_2838
.
After updating WriteDelay with a broadcast write to all device, all future operations will have the correct timings.
[1] The Toshiba 8Mbit datasheet confirms this as the default. It default to 4, because some devices (like the Toshiba 18Mbit) only have 2 bits, so can only support a maximum write delay of 4. Devices with more bits default to 0b100 for compatibility.
Mode 0x03
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | RW-1 | RW-1 | RW-0 | R-0 | RW-0 | RW-1 | RW-0 | RW-0 |
CE | X2 | PL | SV | SK | AS | DE | LE | |
23:16 | RW-1 | RW-1 | U-0 | U-0 | RW-0 | U-0 | U-0 | U-0 |
C5 | C2 | — | — | AD | — | — | — | |
15:8 | RW-1 | RW-1 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
C4 | C1 | — | — | — | — | — | — | |
7:0 | RW-1 | RW-1 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
C3 | C0 | — | — | — | — | — | — |
bit 31 | CE / CCEnable: Current Control Enable. 0 = manual 1 = auto |
bit 30 | X2 / CCMult: Should be 1. Inverted when read. (Toshiba datasheet states that it select wether X1 or X2 register is used for the current control register). |
bit 29 | PL: Select PowerDown Latency |
bit 28 | SV / SkipValue: For tests. 0 |
bit 27 | SK / Skip: For tests. 0 |
bit 26 | AS / AutoSkip: For tests. 1 |
bit 25 | DE / DeviceEnable: Enable RDRAM device. When disabled, only broadcast register requests can be executed. 0 = disabled 1 = enabled |
bit 24 | LE: Enable PowerDown mode for RDRAM that supports it to reduce power consumption. |
bit 19 | AD / AckDis: For low latency RDRAM only. Allows to supress acknowledge response when set to 1. |
bit 23,15,7,22,14,6 | C[5:0] / CCValue: Current Control value which controls in fine the output current IOL. In manual mode (CE=0), IOL is proportional to (63-CC) with IOL ~ (0.95±0.3)×(63-CC) mA, for CC = 0..63. (These coefficients derive from Imax±△/63 and vary between models) This field is inverted when read. In auto mode (CE=1), IOL is proportional to (63-CC) with IOL ~ (1.25±0.1)×(63-CC) mA, for CC = 31..63. (These coefficients derive from I40±△/(63-31) and vary between models) An internally generated value is returned when read. |
RefInterval 0x04
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? |
? | ||||||||
23:16 | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? |
? | ||||||||
15:8 | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? |
? | ||||||||
7:0 | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? | ?-? |
? |
bit 31-0 | ?: Unknown format |
RefRow 0x05
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | RW-? | RW-? | RW-? | RW-? | RW-? | RW-? | RW-? | U-0 |
RowField[7:1] | — | |||||||
23:16 | U-0 | U-0 | U-0 | U-0 | RW-? | U-0 | U-0 | U-0 |
— | — | — | — | BankField | — | — | — | |
15:8 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | RW-? | RW-? |
— | — | — | — | — | — | RowField[9:8] | ||
7:0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
— | — | — | — | — | — | — | — |
bit 9-8,31-25 | RowField: Current row being refreshed. |
bit 19 | BankField: Current bank being refreshed. |
This register is normally read or written only for testing purpose.
RasInterval 0x06
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | U-0 | U-0 | U-0 | RW-? | RW-? | RW-? | RW-? | RW-? |
— | — | — | RowPrecharge[0:4] | |||||
23:16 | U-0 | U-0 | U-0 | RW-? | RW-? | RW-? | RW-? | RW-? |
— | — | — | RowSense[0:4] | |||||
15:8 | U-0 | U-0 | U-0 | RW-? | RW-? | RW-? | RW-? | RW-? |
— | — | — | RowImpRestore[0:4] | |||||
7:0 | U-0 | U-0 | U-0 | RW-? | RW-? | RW-? | RW-? | RW-? |
— | — | — | RowExpRestore[0:4] |
bit 24-28 | RowPrecharge: Specify RowPrecharge timing. |
bit 16-20 | RowSense: Specify RowSense timing. |
bit 8-12 | RowImpRestore: Specify RowImpRestore timing. |
bit 0-4 | RowExpRestore: Specify RowExpRestore timing. |
NOTE: all fields are in bit reversed order (bit 4 is LSB, bit 0 is MSB).
MinInterval 0x07
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | R-0 | R-0 | R-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
MAD[3] | MRD[3] | MWD[3] | — | — | — | — | — | |
23:16 | R-0 | R-1 | R-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
MAD[2] | MRD[2] | MWD[2] | — | — | — | — | — | |
15:8 | R-1 | R-1 | R-0 | U-0 | U-0 | U-0 | U-0 | U-0 |
MAD[1] | MRD[1] | MWD[1] | — | — | — | — | — | |
7:0 | R-1 | R-1 | R-1 | W-0 | W-0 | W-0 | W-0 | W-0 |
MAD[0] | MRD[0] | MWD[0] | SpecFunc[4:0] |
bit 31,23,15,7 | MinAckDelay: Minimum of AckDelay of RDRAM. |
bit 30,22,14,6 | MinReadDelay: Minimum of ReadDelay of RDRAM. |
bit 29,21,13,5 | MinWriteDelay: Minimum of WriteDelay of RDRAM. |
bit 4-0 | SpecFunc: Performs various commands when written, see table below. |
As SpecFunc is the only writable field in the register, you can just write the command.
SpecFunc Commands:
00000 |
- Nop | - Do nothing |
xxxx1 |
- SetRR | - Manual refresh. The device immediately preforms a single a burst refresh of two rows per bank |
x0x10 |
- ClrRE | - Disable automatic refresh |
x01xx |
- SetPD | - Enter the powerdown state |
×1x0x |
- SetRE | - Enable automatic refresh |
1xxxx |
- Reserved |
The N64 implements refresh by broadcasting one SetRR command whenever VI emits a horizontal sync pulse.
AddressSelect 0x08
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | RW-0 | U-? |
SwapField[6:0] | - | |||||||
23:16 | U-? | U-? | U-? | U-? | U-? | U-? | RW-0 | RW-0 |
- | - | - | - | - | - | SwapField[8:7] | ||
15:8 | U-? | U-? | U-? | U-? | U-? | U-? | U-? | U-? |
- | - | - | - | - | - | - | - | |
7:0 | U-? | U-? | U-? | U-? | U-? | U-? | U-? | U-? |
- | - | - | - | - | - | - | - |
bit 31-25,16-15 | SwapField: Each bit swaps two bits of the address. When all bits are 0, there is no swaping. |
Extra Details:
- The address swapping feature allows banks to be interleaved. For example, Bank 0 row 0 can be followed by Bank 1 row 0 and so on.
This can improve performance for many memory access patterns.
- However, RI doesn't appear to support this feature. It expects Bank zero to be in the first megabyte of address space, Bank one in the second megabyte, and so on.
DeviceManufacturer 0x09
| ||||||||
---|---|---|---|---|---|---|---|---|
31:24 | R-? | R-? | R-? | R-? | R-? | R-? | R-? | R-? |
ManufactureCode[7:0] | ||||||||
23:16 | R-? | R-? | R-? | R-? | R-? | R-? | R-? | R-? |
ManufactureCode[15:8] | ||||||||
15:8 | R-? | R-? | R-? | R-? | R-? | R-? | R-? | R-? |
Manufacture[7:0] | ||||||||
7:0 | R-? | R-? | R-? | R-? | R-? | R-? | R-? | R-? |
Manufacture[15:8] |
bit 31-24,23-16 | ManufactureCode: Manufacture is allowed to put whatever they want here? |
bit 15-8,7-0 | Manufacture: Code specifying the manufacturing company, see table below. |
Manufacture | ID Code |
---|---|
Toshiba | 0x0002
|
Fujitsu | 0x0003
|
NEC | 0x0005
|
Hitachi | 0x0007
|
Oki | 0x0009
|
LG Semicon | 0x000a
|
Samsung | 0x0010
|
Hyundai | 0x0013
|
RDRAM addressing
Warning : In this paragraph, we describe RDRAM addressing within the RDRAM protocol. This is not to be confused with RDRAM addresses "as seen" by the CPU or RCP. See RI memory addressing paragraph for details about how the RI converts addresses between the two address spaces.
RDRAM protocol addresses RDRAM memory and registers using a 36bit address and a variety of commands :
- many types of memory read
- many types of memory write
- register read
- register write
- broadcast register write (all connected RDRAM will write the same value to the specified register)
The higher part of the address identify an RDRAM device, the lower part is an offset within the device (in register-space for register commands, and memory space for memory commands).
The procedure of identifying which RDRAM device is addressed by a given command + address is call Id matching.
It works as follow:
Given a 36 bit address Adr[35:0], we compute a "partially bit-swapped" AdrS[35:0] such that bits [28:20] and bits [19:11] are swapped on a bit by bit bases based on the value of SwapField (from AddressSelect register). Bits [35:29] and [10:0] are left untouched. This swapping of bits provides a flexible way of remapping addresses across banks of a given device and across devices to benefit from internal row caching. This can help increase DRAM hit rate in several applications.
The upper 16 bits (or 15bits for 2x{8,9}Mbit devices) of AdrS are then compared to IdField contained in DeviceId register. If both are equal the RDRAM device has a Id Match.
More formally this can be written as follow :
AdrS[35:29] = Adr[35:29]
AdrS[28:20] = ( SwapField[8:0] & Adr[19:11]) | (~SwapField[8:0] & Adr[28:20])
AdrS[19:11] = (~SwapField[8:0] & Adr[19:11]) | ( SwapField[8:0] & Adr[28:20])
AdrS[10:0] = Adr[10:0]
IdMatch = AdrS[35:M] == IdField[35:M]
with M = 21 for 2x{8,9]Mbit modules, M=20 for 1x{8,9}Mbit modules.
Remark : An IdMatch doesn't mean necessarily that the RDRAM device will act on the request, and conversely a non matching RDRAM device can still act on a request. Other factors such as DeviceEnable bit from ModeRegiter, SIn pinout can inhibit a request, and the broadcast register write can force a request even on non matching RDRAM device.
Current Control calibration
Any RDRAM device (module and controller) wishing to "talk" on the RDRAM channel must configure its output current IOL controlled by the current control (CC for short) register.
2 modes are possible to configure the current control register :
- Manual mode. In this mode, the value of the current control register is linearly correlated to IOL, such that IOL @ CC=63 -> 0mA, and IOL @ CC=0 -> Imax (Imax will vary between RDRAM due to process differences). In this mode, fluctuation due to temperature, change over time and are not compensated, so it may require a manual periodic readjustment. Note also that, in this mode, the CC register value read will be inverted.
- Automatic mode. In this mode, small fluctuations are automatically corrected, so no further readjustment should be required. The relation between CC value and IOL is still mostly linear but with a different slope. Note also that, in this mode, the CC register value read will be an internally generated one, not the one used to program the CC register.
The purpose of the CC calibration procedure is to find the CC value in Automatic mode that maximize the signal margin.
One possible approach to do so is described below :
We define the quantity CCi = 63-CC (= CC^63 = CC "inverted") which is more natural to use because IOL is proportional to CCi.
We define a memtest80 function which writes an octbyte with all bits set to '1' (eg. UINT64_C(0xffffffffffffffff)) at the start of the RDRAM device to test, and read back the 6th byte of the previously written octbyte. It then counts how many bits were set. This write / readback is done 10 times, and the cumulated number of '1' bits read is returned. For a non calibrated RDRAM device, the number is less than 80 (eg. the device can't always transfer back '1' because of inadequate VOL). Basically, the returned value gives a score (over 80) of the quality of the RDRAM device transmission with the current CC value.
1. Estimate the value of CCi in manual mode which gives a VOL almost equal to VREF. This can be done by writing increasing CCi values in manual mode and accumulating the weighted difference CCi * (memtest80 - previous_memtest80) for CCi = 0..N (N being the first value of CCi which allows to read all 80 bits during memtest80; N <= 63). This weighted sum of CCi*(memtest80-previous_memtest80) for CCi = 0..N, divided by 80 (minus 0.5 to account for accumulation/rounding errors) is an estimate of CCi which gives VOL ~ VREF.
2. Multiply this value by 2.2: doubling the CCi value, with 10 percent margin, should give a reasonable estimate of CCi such that VOL is symmetric to VOH with respect to VREF (eg. it maximizes signal margin).
3. Convert the obtained manual CCi value to auto CCi. Here the procedure is again iterative and tries to find the value CCi to write in Auto mode which minimizes the absolute difference between the CCi value read in auto mode (remember this is an internally generated value different from the CCi value written) and the target manual CCi.
4. Repeat this whole procedure 4 times and average the obtained auto CCi value.
In practice steps 1., 2. and 3. avoid usage of floating points and rescale some values with an appropriate scaling factor (here 80x10) to avoid loss of precision due to integer computations.
Known RDRAM Console Chip Configurations
N64 Version | Board revision | Region | Number of RDRAM Chips | Size per Chip(Mbytes) | Size per Chip(Mbits) |
---|---|---|---|---|---|
NUS-001 | (P) - 01 | PAL | 2 | 2.25Megabytes | 18Mbit |
NUS-002 | (P) - 02 | PAL | 1 | 4.5Megabytes | 36Mbits |
Known RDRAM Expansion Pak Configurations
Expansion Pak Type | Number of RDRAM Chips | Size per Chip(Mbytes) | Size per Chip(Mbits) |
---|---|---|---|
Jumper Pak (Nintendo Official) | 0 | 0 | 0 |
Expansion Pak (Nintendo Official) | 1 | 4.5Megabytes | 36Mbits |
There are 3rd party Expansion Paks that have 2 chips which are both 2.25Megabytes each. Please provide images and makers here.
Initialization Sequence
This Initialization sequence is based on the 6102 CIC boot code
RDRAM Initialization procedure as implemented in IPL3:
1. a. Enable RI Auto Current b. let it settle by waiting using countdown(8800) c. load RI CC value 2. Enable RI T/R select 3. a. Force RI_MODE reset, disable R/T stop b. wait using countdown(4) 4. a. Force RI_MODE standby, enable R/T stop b. wait using countdown(32) 5. a. Set MI INIT mode + length=15 by writing 0x10f to [MI_MODE_REG] b. Setup all RDRAM delays (AckWin=5,Read=7,Ack=3,Write=1) by writing 0x18082838 to [0xa3f80004] [bcast] 6. a. Setup all RDRAM refresh row to 0 [bcast] b. Move all RDRAM modules to top of address space deviceid = 0x80000000 [bcast] 7. a. compute rdram reg space size (reg_step) based on RCP version (RCPv1: 128, RCPv2: 256) b. init top rdram reg pointer (RDRAM_REGS_BASE + 32 * reg_step) 8. First pass which walk through at most 8 RDRAMs and for valid ones: a. place them at next 2MB boundary (eg. rdram_deviceid = i * 0x08000000) b. compute optimal (auto) current calibration value for RDRAM module and apply it c. exit first pass loop if cc value is zero, eg. no RDRAM module is present d. read device description registers (device_type + manufaturer). These reads must be surrounded by MI_MODE= SET_DRAM_REG and CLR_DRAM_REG because individual rdram registers are accessed. e. based on device description, setup optimal RAS timing f. store RDRAM parameters (CC, geometry {eg. col, bank, row fields from device_type}) for second pass g. update values which tracks how to reorder all 2MB RDRAM modules before the 1MB, how many modules are effectively presents and the 2MB_bitfield (=2^(number of 2MB modules)-1, because all 2MB banks will be placed first) 9. a. Disable all RDRAM modules (rdram_mode = 0xc4000000) [bcast] b. and move them all back to top of address space (rdram_deviceid = 0x80000000) [bcast] 10. Second pass iterate through all modules discovered during the first pass and: a. reorder them so that all 2MB modules are placed before 1MB modules b. write previously computed optimal CC for each module c. touch RDRAM modules to settle their timing circuits. 1MB modules undergo 4 consecutive reads (ptr+k*0x00080000, k=0..1) x 2 2MB modules undergo 8 consecutive reads (ptr+k*0x00080000, k=0..3) x 2 11. a. setup RI refresh register = 0x63634 | 2MB_bitfield << 19 b. do a dummy read of RI refresh reg 12. Return amount of detected RDRAM.
Trivia: there is very likely a copy-paste error in the original IPL3 code when incrementing t6 in the second pass. It should have been t8 so we can place next 1MB module at next slot. But I guess it went unnoticed because retail models don't use 1MB modules.
Expansion Pak Detection
The typical way to detect how much memory is installed is to probe it.
LibUltra provides a function called osGetMemSize() which does this. The function writes different values at addresses in the uncached KSEG1 direct map, starting at 0xa0300000, and then reads the values back. It tries successively higher addresses, jumping by 1 MB each time through the loop. It returns the amount of RAM which it successfully wrote and read back, rounded up to a number of megabytes.
// C-like pseudocode... u32 osGetMemSize(void) { // Base address of RAM in kseg1. uintptr_t base_addr = 0xa0000000; uintptr_t megabyte = 1024 * 1024; // Address where we will probe. uintptr_t cur_addr = kseg1 + 3 * megabyte; while (true) { write to addr; read from addr; if (value read != value written) { break; } cur_addr += megabyte; } return cur_addr - base_addr; }
During boot, IPL3 will also write the amount of RAM available, in bytes, to a 32-bit value at address 0x80000318 (or 0x800003f0, for CIC 6105). On retail hardware, this should always have the value 0x400000 (no expansion pak) or 0x800000 (expansion pak). When using LibUltra, this variable can be accessed with the name osMemSize, which is defined like this:
extern u32 osMemSize;
LibDragon provides the the amount of memory installed with the get_memory_size() function.
Drawbacks and Limitations
Opinion
RDRAM has excellent data transfer speed for the era (bytes per second) but due to the protocol used and serial interface, memory transactions were somewhat slower (how much time it took from starting a read/write operation to finishing it). In practice, you may find that the available memory bandwidth is a limiting factor for the performance of your game. See: How fast was Rambus compared to regular EDO RAM?
— Vanadium
Datasheets
Several manufacturers produced compatible "Base RDRAM" modules such as :
- LG GM73V1892AH16L
- NEC uPD488170L
- OKI MSM5718B70
- Toshiba TC59R1809VK TC59R1809HK (18Mbit chip)
- Toshiba TC59R0808HK (8Mbit chip)
Reference : [2]