Details to be added here...
Initial Program Load
Notice! This page is intended solely for research, archival, and preservation purposes. Disassembled code may be subject to copyright law. |
The Initial Program Load (commonly called IPL, or the boot sequence) is a set of instructions that the console performs every time it starts. There are multiple stages to this process, referred to as: IPL1, IPL2 and IPL3. When a 64DD is in use, there is also an IPL4 stage. Upon power-on, NMI, or "soft" reset, the program counter is set by hardware to 0xBFC00000
, which is the beginning of the PIF ROM address space. This ROM is baked into the PIF-NUS chip.
When reading disassemblies, always remember that branch instructions have delay slots (which some branches may discard under certain circumstances). The operations performed in delay slots, may not be related to the branch itself.
Each line is formatted like so: [address][opcode][assembly instruction] # comments
.
IPL1
This stage spans only the first 0xD4
bytes (0xBFC00000 - 0xBFC000D3
) as executing instructions directly out of the PIF is relatively slow. Only what is absolutely necessary, is executed here. IPL1 resets the console to a consistent reset state, and moves the remaining bytes of the PIF ROM, that is IPL2, into the RSP's IMEM (0xA4001000
).
The goal of IPL1 is basically to move out execution context from the PIF as soon as possible for two reasons: first, running code directly from PIF requires a very slow serial transfer for each transferred word, and given that CPU cache doesn't work in this context, execution is very slow. Second, as soon as the CPU is out of this context, the PIF ROM can be locked again, for security reasons, hoping the minimize the window in which PIF returns the ROM on the bus and can thus be easily dumped.
### IPL1 ### 0xBFC00000 - 0xBFC000D3 (0xD4 bytes long) ###
# SEGMENT 1 # Initialize CP0 Status & Config registers
[0xBFC00000][0x3C093400][LUI t1, 0x3400] # t1 = 0x34000000
[0xBFC00004][0x40896000][MTC0 t1, SR] # SR = t1 (enables CP0, CP1, and FPU registers)
[0xBFC00008][0x3C090006][LUI t1, 0x0006] # t1 = 0x00060000
[0xBFC0000C][0x3529E463][ORI t1, t1, 0xE463] # t1 = 0x0006E463
[0xBFC00010][0x40898000][MTC0 t1, Config] # Config = t1 (sets SysAD port writeback pattern to "D", sets Big-Endian mode, and sets KSEG0 as a cached region)
# SEGMENT 2 #
# 2a: Wait for RSP halt
[0xBFC00014][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
[0xBFC00018][0x8D080010][LW t0, t0, 0x0010] # t0 = value stored at 0xA4040010 (RSP_STATUS register)
[0xBFC0001C][0x31080001][ANDI t0, t0, 0x0001] # t0 = t0 & 0x0001 (isolates the 'halt' bit)
[0xBFC00020][0x5100FFFD][BEQL t0, zr, 0xFFFD] # if t0 == 0, branch to 0xBFC00018 (this is a spin loop, waiting for the RSP to halt)
[0xBFC00024][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
[0xBFC00028][0x2408000A][ADDIU t0, zr, 0x000A] # t0 = 0x0000000A
[0xBFC0002C][0x3C01A404][LUI at, 0xA404] # at = 0xA4040000
[0xBFC00030][0xAC280010][SW t0, at, 0x0010] # write t0 (0x0000000A) into 0xA4040010 (RSP_STATUS register: sets 'halt' and clears 'rsp interrupt' bits)
[0xBFC00034][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
[0xBFC00038][0x8D080018][LW t0, t0, 0x0018] # t0 = value stored at 0xA4040018 (RSP_DMA_BUSY register)
[0xBFC0003C][0x31080001][ANDI t0, t0, 0x0001] # t0 = t0 & 0x0001 (isolates the 'halt' bit)
[0xBFC00040][0x5500FFFD][BNEL t0, zr, 0xFFFD] # if t0 != 0, branch to 0xBFC00038 (this is a spin loop, waiting for the RSP to not be halted)
[0xBFC00044][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
# 2b: Reset PI
[0xBFC00048][0x24080003][ADDIU t0, zr, 0x0003] # t0 = 0x00000003
[0xBFC0004C][0x3C01A460][LUI at, 0xA460] # at = 0xA4600000
[0xBFC00050][0xAC280010][SW t0, at, 0x0010] # write t0 (0x00000003) into 0xA4600010 (PI_STATUS register: clears PI interrupt and resets PI controller)
# 2c: Clear video output
[0xBFC00054][0x240803FF][ADDIU t0, zr, 0x03FF] # t0 = 0x000003FF
[0xBFC00058][0x3C01A440][LUI at, 0xA440] # at = 0xA4400000
[0xBFC0005C][0xAC28000C][SW t0, at, 0x000C] # write t0 (0x000003FF) into 0xA440000C (VI_V_INTR register: sets vertical interrupt trigger to half-line 0x3FF)
[0xBFC00060][0x3C01A440][LUI at, 0xA440] # at = 0xA4400000
[0xBFC00064][0xAC200024][SW zr, at, 0x0024] # write zr (0x00000000) into 0xA4400024 (VI_H_VIDEO register: sets the start and end of active video image to zero)
[0xBFC00068][0x3C01A440][LUI at, 0xA440] # at = 0xA4400000
[0xBFC0006C][0xAC200010][SW zr, at, 0x0010] # write zr (0x00000000) into 0xA4400010 (VI_V_CURRENT register: clears the VI interrupt)
# 2d: Stop audio
[0xBFC00070][0x3C01A450][LUI at, 0xA450] # at = 0xA4500000
[0xBFC00074][0xAC200000][SW zr, at, 0x0000] # write zr (0x00000000) into 0xA4500000 (AI_DRAM_ADDR register: sets DMA RDRAM address to zero)
[0xBFC00078][0x3C01A450][LUI at, 0xA450] # at = 0xA4500000
[0xBFC0007C][0xAC200004][SW zr, at, 0x0004] # write zr (0x00000000) into 0xA4500004 (AI_LENGTH register: sets transfer length to zero)
# 2e: Wait for any RSP DMAs to complete
[0xBFC00080][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
[0xBFC00084][0x8D080010][LW t0, t0, 0x0010] # t0 = value stored at 0xA4040010 (RSP_STATUS register)
[0xBFC00088][0x31080004][ANDI t0, t0, 0x0004] # t0 = t0 & 0x0004 (isolates the 'dma busy' bit)
[0xBFC0008C][0x5500FFFD][BNEL t0, zr, 0xFFFD] # if t0 != 0, branch to 0xBFC00084 (this is a spin loop, waiting for any RSP DMAs to complete)
[0xBFC00090][0x3C08A404][LUI t0, 0xA404] # t0 = 0xA4040000
# SEGMENT 3 # Copy IPL2 from PIF to RSP IMEM
# 3a: Initialize start/end addresses
[0xBFC00094][0x3C0BA400][LUI t3, 0xA400] # t3 = 0xA4000000
[0xBFC00098][0x3C0CBFC0][LUI t4, 0xBFC0] # t4 = 0xBFC00000
[0xBFC0009C][0x3C0DBFC0][LUI t5, 0xBFC0] # t5 = 0xBFC00000
[0xBFC000A0][0x256B1000][ADDIU t3, t3, 0x1000] # t3 = 0xA4001000 (start of RSP IMEM)
[0xBFC000A4][0x258C00D4][ADDIU t4, t4, 0x00D4] # t4 = 0xBFC000D4 (start of IPL2 in PIF ROM)
[0xBFC000A8][0x25AD071C][ADDIU t5, t5, 0x071C] # t5 = 0xBFC0071C (end of IPL2 in PIF ROM)
# 3b: Load/Store 1 word (4 bytes) at a time, moving instruction data from [0xBFC000D4 - 0xBFC0071B], into [0xA4001000 - 0xA4001647]
[0xBFC000AC][0x8D890000][LW t1, t4, 0x0000] # t1 = value stored at address t4
[0xBFC000B0][0x258C0004][ADDIU t4, t4, 0x0004] # increment t4 (+4)
[0xBFC000B4][0x256B0004][ADDIU t3, t3, 0x0004] # increment t3 (+4)
[0xBFC000B8][0x158DFFFC][BNE t4, t5, 0xFFFC] # if t4 != t5, branch to 0xBFC000AC (repeat the loop)
[0xBFC000BC][0xAD69FFFC][SW t1, t3, 0xFFFC] # write t1 into address (t3 - 4)
# SEGMENT 4 # Jump to IPL2 in IMEM
[0xBFC000C0][0x3C0BA400][LUI t3, 0xA400] # t3 = 0xA4000000
[0xBFC000C4][0x3C1DA400][LUI sp, 0xA400] # sp = 0xA4000000
[0xBFC000C8][0x256B1000][ADDIU t3, t3, 0x1000] # t3 = 0xA4001000 (beginning of IMEM and IPL2)
[0xBFC000CC][0x01600008][JR t3] # jump to IMEM/IPL2 (and executes delay slot)
[0xBFC000D0][0x37BD1FF0][ORI sp, sp, 0x1FF0] # sp = 0xA4001FF0 (this prepares sp for use in IPL2)
IPL2
After IPL1 finishes moving the instructions of this stage to RSP IMEM, it jumps to 0x04001000
to start this second stage.
IPL2 has just a very simple task: it loads IPL3 from the ROM (offsets 0x40-0x1000) and verifies its checksum. To do so, it actually sends the checksum to the PIF, which verifies if it's correct by separately fetching the expected one from the CIC. If the checksum doesn't match, PIF halts execution of the CPU (by asserting the NMI pin). IPL2 instead just proceeds to calling IPL3, hoping for the best.
To access the ROM and be able to read the IPL3, IPL2 needs to configure the PI bus access timings. To do first, first it configures PI to its minimum speed, and reads the first 4 bytes of the header. Byte 1-3 in fact are expected to contain the (fastest) PI timings that the ROM supports. It then configures those timings, and proceeds reading the IPL3 into DMEM, one word at a time (notice that DMA wouldn't be possible here, because PI DMA only reads/writes to RDRAM, not DMEM).
IPL3
This stage is executed out of RSP DMEM, which was loaded with the first 0x1000
bytes of the game cartridge (or alternatively, whatever may be inserted in the cartridge or expansion slot on the bottom of the console). IPL2 will jump to 0x04000040
(0x40
bytes after the start of DMEM, to account for the ROM header).
The goal of IPL3 is to initialize RDRAM, which requires a complex initialization process that includes current calibration. During this stage, IPL3 also discovers whether there is an Expansion Pak installed and thus whether the additional 4 MiB of RDRAM are available. After RDRAM is initialized, it proceeds to booting the game.
Nintendo proprietary IPL3
Over the course of the commercial life of the console, Nintendo released six known variants of the IPL3 stage, each one with its own matching CIC chip. In fact, as discussed previously for IPL2, each IPL3 bootcode has its checksum verified against the hardcoded value written in the CIC chip. This basically means that you can't easily change a physical CIC chip on the cartridge, without also replacing the IPL3 code.
A seventh variant can be seen when analyzing the dumps of the GameBooster 64, Action Replay Pro 64, and GameShark Pro (v3.3): where the 0x0FC0
bytes after the header, are all zeros, and are dynamically loaded by the cartridge.
For each of the six variants, there is a matching CIC chip found in each cartridge. There are 10 official CIC chips:
NTSC | PAL | % of ROMs (Qty) |
---|---|---|
CIC-NUS-6102 | CIC-NUS-7101 | 87.6% (826) |
CIC-NUS-6103 | CIC-NUS-7103 | 6.5% (61) |
CIC-NUS-6105 | CIC-NUS-7105 | 4.5% (43) |
CIC-NUS-6106 | CIC-NUS-7106 | 0.09% (8) |
CIC-NUS-6101 | CIC-NUS-7102 | 0.05% (5) |
Each of these NTSC/PAL pairs share an IPL3 variant, except 6101 and 7102.
After Nintendo IPL3 has initialized the RDRAM, it proceeds loading 1 MiB of game code i(ROM addresses 0x10001000 - 0x10101000), into RDRAM starting from the boot address specified in the ROM header (offset 0x8), and the jumping to it. This actually finishes the boot sequence and hands off the execution to the actual game.
The amount of code being loaded (1 MiB) is hardcoded and cannot be changed. Games are expected to cope with it. In most cases, this is more than neeeded, and extra loaded data is basically ignored (and maybe reloaded as part of the game asset loading code, to different addresses). If the game requires more than 1 MiB of code, it is expected to load it manually (eg: via dynamically loadable code segments, sometimes called "overlays").
Details to be added here...
Details to be added here...
Details to be added here...
Details to be added here...
Details to be added here...
Libdragon open-source IPL3
The open source homebrew SDK Libdragon ships with an open source IPL3, that is embedded by default in all the ROMs made with libdragon. Compared to the proprietary one, it is much faster (boots in ~100ms vs ~500ms for an average ROM) and expects the game in ELF format, so that it is able to properly read text and data segments of any size (not just a single hardcoded one) and clear BSS. It is unencumbered and was implemented with a clean room approach, so it is supposedly free of licensing concerns.
To allow booting on a real console in a true hardware boot scenario (eg: replica cartridges), each release binary is bruteforced (with GPUs) so that its checksum matches the expected one for CIC 6102. IPL2 will then successfully recognizes it as a valid IPL3 for that CIC, and allows boot to proceed.
IPL4
TODO