Reality Display Processor/Interface: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
(Some DPS_TEST_MODE and related)
(More consistent terminology)
Line 1: Line 1:
The RDP interface is the set of registers that allow to control the RDP and make it perform the required rasterization jobs.
The RDP interface is the set of registers that allow to control the RDP and make it perform the required rasterization jobs.


RDP executes a stream of commands called ''primitives'' that are sent to it via DMA. The RDP interface allows to initiate and monitor the DMA transfers to RDP, and to query the current status of RDP.
RDP executes a stream of commands that are sent to it via DMA. The RDP interface allows to initiate and monitor the DMA transfers to RDP, and to query the current status of RDP.


== DMA transfers ==
== DMA transfers ==


DMA transfers allow to send a sequence of primitives from RDRAM or DMEM to the RDP. When a DMA is triggered the RDP will start fetching the primitives in small batches within an internal command buffer, from which they will get run; the DMA will then wait for space to become available in this internal command buffer before more data can be transferred.
DMA transfers allow to send a sequence of commands from RDRAM or DMEM to the RDP. When a DMA is triggered the RDP will start fetching commands in small batches into an internal FIFO queue, from which they will get run; the DMA will then wait for space to become available in this internal command FIFO before more data can be transferred.


Primitives can be stored in either RDRAM or DMEM. Bit 0 of DP_STATUS is used to select whether RDRAM or DMEM is used. When reading data from DMEM, the RDP uses an internal bus in the RCP called XBUS, so normally "using XBUS" is a shorthand expression for "programming the RDP to fetch primitives from DMEM". Notice that both VR4300 and RSP can program the RDP to either use XBUS or not; there is no correlation between the CPU programming the RDP and the data source being used.
Commands can be stored in either RDRAM or DMEM. Bit 0 of DPC_STATUS is used to select whether RDRAM or DMEM is used. When reading data from DMEM, the RDP uses an internal bus in the RCP called XBUS, so normally "using XBUS" is a shorthand expression for "programming the RDP to fetch commands from DMEM". Notice that both VR4300 and RSP can program the RDP to either use XBUS or not; there is no correlation between the CPU programming the RDP and the data source being used.


RDP is a highly parallel unit. There is thus no correlation between a DMA being finished and the respective primitives being finished (that is, pixels drawn into the framebuffer). Instead, the end of DMA can just be used as a signal that the RDRAM/DMEM buffer that stores the primitives can be recycled, but no further information can be deducted about the actual execution of the primitives. A few syncing primitives can be used to create syncing points in the various RDP internal parallel units; see the SYNC primitives of more information.
The RDP is a deeply pipelined unit. There is thus no correlation between a DMA being finished and the respective commands being finished (e.g. pixels drawn into the framebuffer). Instead, the end of DMA can just be used as a signal that the RDRAM/DMEM buffer that stores the commands can be recycled, but no further information can be deduced regarding the actual execution of the commands. A few synchronization commands can be used to guarantee that work will have been completed by the RDP before the next command begins execution; see the page on RDP commands for more details.


RDP primitives are made of one or multiple 64-bit (8 bytes) words. For this reason, RDP DMA must fetch data from an address that is 64-bit aligned: in fact, the lowest 3 bits of the DMA address register are ignored. There is no destination register: the destination is the RDP itself and its internal command buffer is not addressable in any way.
RDP commands are composed of one or more 64-bit (8 byte) words. For this reason, RDP DMA must fetch data from an address that is 64-bit aligned: in fact, the lowest 3 bits of the DMA address register are ignored. There is no destination register: the destination is the RDP itself and its internal command FIFO is not addressable in any way.


=== Incremental transfers ===
=== Incremental transfers ===


To allow the RDP to begin processing primitives as soon as they are available (that is, while the VR4300 and RSP are generating them), the RDP DMA allows for incremental transfers: in fact, the DP_END register can be updated while a DMA is in progress (or after it has finished) and the effect is that the DMA will continue running until the new end pointer is reached. This operation is totally safe and free of race conditions. The intended purpose is that the VR4300 or the RSP can continue updating the DP_END register while they add more data to the primitive buffer in RDRAM/DMEM, until it is full. At that point, they can start another transfer to switch to another buffer.
To allow the RDP to begin processing commands as soon as they are available (that is, while the VR4300 and RSP are generating them), the RDP DMA allows for incremental transfers; the DPC_END register can be updated while a DMA is in progress (or after it has finished) and the effect is that the DMA will continue running until the new end pointer is reached. This operation is totally safe and free of race conditions. The intended purpose is that the VR4300 or the RSP can continue updating the DPC_END register while they add more data to the command buffer in RDRAM/DMEM until it is full, at which point they can start another transfer to switch to another buffer.


=== Double buffering ===
=== Double buffering ===
Line 21: Line 21:
DMA registers are double-buffered: this means that it is possible to program a new DMA transfer while another one is in progress. A new DMA transfer in this context means starting again from another buffer: we do not consider incremental transfers described above as "new transfers".
DMA registers are double-buffered: this means that it is possible to program a new DMA transfer while another one is in progress. A new DMA transfer in this context means starting again from another buffer: we do not consider incremental transfers described above as "new transfers".


To program a pending DMA transfer, just write to <code>DP_START</code>/<code>DP_END</code> a new buffer start/end address. The <code>START_PENDING</code> / <code>END_PENDING</code> bits in <code>DP_STATUS</code> will be set to 1, signaling that a transfer is indeed pending. New writes to <code>DP_END</code> will now update the pending transfer; in other words, after a new DMA transfer is pending, it is not possible to incrementally add more primitives to the currently-running transfer.
To program a pending DMA transfer, just write to <code>DPC_START</code>/<code>DPC_END</code> a new buffer start/end address. The <code>START_PENDING</code> / <code>END_PENDING</code> bits in <code>DPC_STATUS</code> will be set to 1, signaling that a transfer is indeed pending. New writes to <code>DPC_END</code> will now update the pending transfer; in other words, after a new DMA transfer is pending, it is not possible to incrementally add more commands to the currently-running transfer.


=== Programming considerations ===
=== Programming considerations ===
The choice between using XBUS or not is an open debate. There is no clear cut answer and it should be carefully considered depending on the expected performance implications:
The choice between using XBUS or not is an open debate. There is no clear cut answer and it should be carefully considered depending on the expected performance implications:


* If primitives are already in RDRAM (eg: a static display list of RDP commands, read from ROM), then it is obviously more efficient to send them directly from there, without copying them first to DMEM. Libultra does not support this (in libultra, all RDP primitives are always passed through RSP as they were RSP commands first, causing a double memory bandwidth impact if they are then sent back to RDRAM for RDP DMA); in libdragon, this is supported via [https://github.com/DragonMinded/libdragon/blob/caf684a06096afa3e9fc8ec8e70dd6643dab419e/include/rdpq.h#L1346-L1364 rdpq_exec].
* If commands are already in RDRAM (eg: a static display list of RDP commands, read from ROM), then it is obviously more efficient to send them directly from there, without copying them first to DMEM. Libultra does not support this (in libultra, all RDP commands are always passed through RSP as they were RSP commands first, causing a double memory bandwidth impact if they are then sent back to RDRAM for RDP DMA); in libdragon, this is supported via [https://github.com/DragonMinded/libdragon/blob/caf684a06096afa3e9fc8ec8e70dd6643dab419e/include/rdpq.h#L1346-L1364 rdpq_exec].
* Symmetrically, short display lists of RDP commands can be already available in RSP DMEM (as part of the data segment of a RSP microcode). In this case, pushing them directly to RDP via XBUS is surely the fastest option.
* Symmetrically, short display lists of RDP commands can be already available in RSP DMEM (as part of the data segment of a RSP microcode). In this case, pushing them directly to RDP via XBUS is surely the fastest option.
* If primitives are generated by the RSP (eg: triangles at the end of a T&L pipeline), consider the following aspects:
* If commands are generated by the RSP (eg: triangles at the end of a T&L pipeline), consider the following aspects:
** sending back all the primitives to RDRAM will have an impact on memory bandwidth (first, to transfer them from DMEM to RDRAM, and later from RDRAM to RDP). Memory bandwidth is often a bottleneck on N64.
** sending back all the commands to RDRAM will have an impact on memory bandwidth (first, to transfer them from DMEM to RDRAM, and later from RDRAM to RDP). Memory bandwidth is often a bottleneck on N64.
** on the other hand, RDRAM allows for much larger buffers. When the buffers are small (like they typically are in DMEM), it means that the RSP could be forced to wait for the RDP to process the primitives before producing new ones (basically this is back-pressure from RDP to RSP), and in turns it could cause a back-pressure on the VR4300. Often, RDP is the slowest among the three, so a larger buffer allows for better pacing.
** on the other hand, RDRAM allows for much larger buffers. When the buffers are small (like they typically are in DMEM), it means that the RSP could be forced to wait for the RDP to process the commands before producing new ones (basically this is back-pressure from RDP to RSP), and in turns it could cause a back-pressure on the VR4300. Often, RDP is the slowest among the three, so a larger buffer allows for better pacing.
While preparing buffers on RDP primitives, it is useful to take advantage of incremental transfers. This is a possible algorithm:
While preparing buffers on RDP commands, it is useful to take advantage of incremental transfers. This is a possible algorithm:


# Prepare two buffers (in either DMEM or RDRAM).
# Prepare two buffers (in either DMEM or RDRAM).
# Get ready to send the first buffer by setting <code>DP_START</code> = <code>DP_END</code> = pointer to the start of the first buffer. This will not actually transfer any byte (remember DP_END is an ''exclusive'' bound, so if you set <code>DP_START</code> = <code>DP_END</code>, this means "0 byte buffer"), but will setup the DMA engine as such.
# Get ready to send the first buffer by setting <code>DPC_START</code> = <code>DPC_END</code> = pointer to the start of the first buffer. This will not actually transfer any byte (remember DPC_END is an ''exclusive'' bound, so if you set <code>DPC_START</code> = <code>DPC_END</code>, this means "0 byte buffer"), but will setup the DMA engine as such.
# Generate RDP primitives into the first buffer (assuming this is RSP, depending whether you are using XBUS or not, either just write them to DMEM, or also DMA them to RDRAM into the first buffer). Any time a new primitive is added to the buffer, write <code>DP_END</code> to point past it. This basically tells the RDP that there are more primitives to run, as soon as it is ready.
# Generate RDP commands into the first buffer (assuming this is RSP, depending whether you are using XBUS or not, either just write them to DMEM, or also DMA them to RDRAM into the first buffer). Any time a new command is added to the buffer, write <code>DPC_END</code> to point past it. This basically tells the RDP that there are more commands to run, as soon as it is ready.
# When the buffer is full, go back to point 2, switching to the next buffer. Notice that the RDP DMA on the first buffer will continue running until all primitives have been fetched, so the new buffer will be effectively pending at this point. Anyway, you can continue working on the new buffer and keep writing <code>DP_END</code>: this is totally race-free, whether the new transfer is still pending, is ongoing, or even if it is finished.
# When the buffer is full, go back to point 2, switching to the next buffer. Notice that the RDP DMA on the first buffer will continue running until all commands have been fetched, so the new buffer will be effectively pending at this point. Anyway, you can continue working on the new buffer and keep writing <code>DPC_END</code>: this is totally race-free, whether the new transfer is still pending, is ongoing, or even if it is finished.
# Consider that the RDP can only have one transfer pending. So anytime you write <code>DP_START</code> to switch to a new buffer, first check if another transfer is already pending (by checking if the <code>START_PENDING</code> bit is set in <code>DP_STATUS</code>). If it is pending, then you will need to wait for it. This also makes sure you don't start pushing new primitives into the first buffer again, before the previous contents have been fully consumed.
# Consider that the RDP can only have one transfer pending. So anytime you write <code>DPC_START</code> to switch to a new buffer, first check if another transfer is already pending (by checking if the <code>START_PENDING</code> bit is set in <code>DPC_STATUS</code>). If it is pending, then you will need to wait for it. This also makes sure you don't start pushing new commands into the first buffer again, before the previous contents have been fully consumed.


Another possible approach to push primitives into RDP is using a single buffer, and checking <code>DP_CURRENT</code> to race against the DMA. The idea is using the buffer as a circular one, and have the DMA constantly trailing behind our write pointer.
Another possible approach to push commands into the RDP is using a single buffer, and checking <code>DPC_CURRENT</code> to race against the DMA. The idea is using the buffer as a circular one, and have the DMA constantly trailing behind our write pointer.


# Prepare a single buffer (in either DMEM or RDRAM). Write <code>DP_START</code> = <code>DP_END</code> = pointer to the start of the buffer. Notice that, as soon as the RDP accepts these register writes, <code>DP_CURRENT</code> will also point there when read.
# Prepare a single buffer (in either DMEM or RDRAM). Write <code>DPC_START</code> = <code>DPC_END</code> = pointer to the start of the buffer. Notice that, as soon as the RDP accepts these register writes, <code>DPC_CURRENT</code> will also point there when read.
# Generate RDP primitives and write them into the buffer. Any time a new primitive is written, update <code>DP_END</code>. At this point, there are no pending DMAs (<code>START_PENDING</code> = 0), and in general we will have that <code>DP_START</code> <= <code>DP_CURRENT</code> <= <code>DP_END</code>. To visualize this, remember that <code>DP_CURRENT</code> is basically the "read pointer", while <code>DP_END</code> is our "write pointer", within the same circular buffer.
# Generate RDP commands and write them into the buffer. Any time a new command is written, update <code>DPC_END</code>. At this point, there are no pending DMAs (<code>START_PENDING</code> = 0), and in general we will have that <code>DPC_START</code> <= <code>DPC_CURRENT</code> <= <code>DPC_END</code>. To visualize this, remember that <code>DPC_CURRENT</code> is basically the "read pointer", while <code>DPC_END</code> is our "write pointer", within the same circular buffer.
# When we reach the end of the buffer, schedule a new DMA transfer on the same buffer from the beginning (so again <code>DP_START</code> = <code>DP_END</code> = pointer to the start of the buffer). At this point, this second transfer will be pending (<code>START_PENDING</code> = 1), but the RDP DMA will probably be still going through the buffer on the first time. So at this point we have <code>DP_START</code> <= <code>DP_END</code> < <code>DP_CURRENT</code>. Notice in fact that reading <code>DP_START</code> and <code>DP_END</code> will return the ''pending'' values (the new run on the buffer), while <code>DP_CURRENT</code> will still report the currently running transfer, and will keep going until the end of the buffer.
# When we reach the end of the buffer, schedule a new DMA transfer on the same buffer from the beginning (so again <code>DPC_START</code> = <code>DPC_END</code> = pointer to the start of the buffer). At this point, this second transfer will be pending (<code>START_PENDING</code> = 1), but the RDP DMA will probably be still going through the buffer on the first time. So at this point we have <code>DPC_START</code> <= <code>DPC_END</code> < <code>DPC_CURRENT</code>. Notice in fact that reading <code>DPC_START</code> and <code>DPC_END</code> will return the ''pending'' values (the new run on the buffer), while <code>DPC_CURRENT</code> will still report the currently running transfer, and will keep going until the end of the buffer.
# Keep writing primitives from the start of the buffer. This time, though, make sure that you never write past the current value of <code>DP_CURRENT</code>. If you need to write a primitive but you have reached the current value of <code>DP_CURRENT</code>, it means that you risk overwriting primitives that have not been sent to RDP yet. So in this case, you will need to throttle (wait) for a bit.
# Keep writing commands from the start of the buffer. This time, though, make sure that you never write past the current value of <code>DPC_CURRENT</code>. If you need to write a command but you have reached the current value of <code>DPC_CURRENT</code>, it means that you risk overwriting commands that have not been sent to RDP yet. So in this case, you will need to throttle (wait) for a bit.
# As soon as the RDP has finished going through the buffer, it will run the pending transfer and thus start from the beginning of the buffer again. After this happens (you can check it with <code>START_PENDING</code> becoming 0), you can freely go through the buffer writing primitives, without checking <code>DP_CURRENT</code> anymore. In fact, at this point we are back to the initial situation in which <code>DP_START</code> <= <code>DP_CURRENT</code> <= <code>DP_END</code> so it is possible to keep writing until the end of the buffer.
# As soon as the RDP has finished going through the buffer, it will run the pending transfer and thus start from the beginning of the buffer again. After this happens (you can check it with <code>START_PENDING</code> becoming 0), you can freely go through the buffer writing commands, without checking <code>DPC_CURRENT</code> anymore. In fact, at this point we are back to the initial situation in which <code>DPC_START</code> <= <code>DPC_CURRENT</code> <= <code>DPC_END</code> so it is possible to keep writing until the end of the buffer.


In general, the second algorithm is more complex and requires a bit more code to be implemented, but it allows for less throttling and more efficient use of the memory. In fact, in the first scenario, whenever we have filled the available memory (two buffers) and we throttle, we will need to wait until the RDP finishes processing the whole first buffer. In the second scenario, instead, throttling is much reduced because as soon as the RDP processes one primitive, we get room for one more primitive to write.
In general, the second algorithm is more complex and requires a bit more code to be implemented, but it allows for less throttling and more efficient use of the memory. In fact, in the first scenario, whenever we have filled the available memory (two buffers) and we throttle, we will need to wait until the RDP finishes processing the whole first buffer. In the second scenario, instead, throttling is much reduced because as soon as the RDP processes one command, we get room for one more command to write.


==RDP Interface Registers ==
==RDP Interface Registers ==
Line 63: Line 63:
| 0x0410 0000
| 0x0410 0000
| style="text-align: center;" |c8
| style="text-align: center;" |c8
|[[Reality Display Processor/Interface#DP START|DP_START]]
|[[Reality Display Processor/Interface#DPC START|DPC_START]]
| Start address in RDRAM / DMEM for a DMA transfer of RDP primitives
| Start address in RDRAM / DMEM for a DMA transfer of RDP commands
|-
|-
| 0x0410 0004
| 0x0410 0004
| style="text-align: center;" |c9
| style="text-align: center;" |c9
|[[Reality Display Processor/Interface#DP END|DP_END]]
|[[Reality Display Processor/Interface#DPC END|DPC_END]]
|End address in RDRAM / DMEM for a DMA transfer of RDP primitives (exclusive bound)
|End address in RDRAM / DMEM for a DMA transfer of RDP commands (exclusive bound)
|-
|-
|0x0410 0008
|0x0410 0008
| style="text-align: center;" |c10
| style="text-align: center;" |c10
|[[Reality Display Processor/Interface#DP CURRENT|DP_CURRENT]]
|[[Reality Display Processor/Interface#DPC CURRENT|DPC_CURRENT]]
|Current address in RDRAM / DMEM being transferred by the DMA engine
|Current address in RDRAM / DMEM being transferred by the DMA engine
|-
|-
|0x0410 000C
|0x0410 000C
| style="text-align: center;" |c11
| style="text-align: center;" |c11
|[[Reality Display Processor/Interface#DP STATUS|DP_STATUS]]
|[[Reality Display Processor/Interface#DPC STATUS|DPC_STATUS]]
|Status register
|Status register
|-
|-
|0x0410 0010
|0x0410 0010
| style="text-align: center;" |c12
| style="text-align: center;" |c12
|[[Reality Display Processor/Interface#DPC CLOCK|DPC_CLOCK]]
|
|
|
|-
|-
|0x0410 0014
|0x0410 0014
| style="text-align: center;" |c13
| style="text-align: center;" |c13
|[[Reality Display Processor/Interface#DPC BUF BUSY|DPC_BUF_BUSY]]
|
|
|
|-
|-
|0x0410 0018
|0x0410 0018
| style="text-align: center;" |c14
| style="text-align: center;" |c14
|[[Reality Display Processor/Interface#DPC PIPE BUSY|DPC_PIPE_BUSY]]
|
|
|
|-
|-
|0x0410 001C
|0x0410 001C
| style="text-align: center;" |c15
| style="text-align: center;" |c15
|[[Reality Display Processor/Interface#DPC TMEM BUSY|DPC_TMEM_BUSY]]
|
|
|
|}
|}
The registers mirror every 0x20 bytes across the whole range <code>0x0410'0000</code> - <code>0x041F'FFFF</code>.
The registers mirror every 0x20 bytes across the whole range <code>0x0410'0000</code> - <code>0x041F'FFFF</code>.


====<span style="display:none;">0x0410 0000 (c8) - DP_START ====
====<span style="display:none;">0x0410 0000 (c8) - DPC_START ====
----{{#invoke:Register table|head|800px|DP_START <code>0x0410 0000</code> (<code>c8</code>) }}
----{{#invoke:Register table|head|800px|DPC_START <code>0x0410 0000</code> (<code>c8</code>) }}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 123: Line 123:
{{#invoke:Register table|foot}}
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
{{#invoke:Register table|definitions
| 23-0 | START[23:0] | Physical address of the start of the primitive buffer in RDRAM or DMEM. When reading, it always returns the last written value.
| 23-0 | START[23:0] | Physical address of the start of the command list in RDRAM or DMEM. When reading, it always returns the last written value.
}}
}}
'''Extra Details:'''
'''Extra Details:'''
:'''START''' This address points to the beginning of the primitive buffer from which primitives will be fetched by the DMA. After writing this register, the address is latched into the RDP interface, and the <code>START_PENDING</code> bit in <code>DP_STATUS</code> becomes 1, but no transfer is started. Writing <code>DP_END</code> will actually initiate the transfer. Selection of the data source (RDRAM or DMEM) is controller by bit 0 of <code>DP_STATUS</code>. Writing <code>DP_START</code> while another value is pending (<code>START_PENDING</code> is 1) will update the pending value. Notice though that this is a risky operation because of races: the pending transfer could in fact start at any point, and if you write a new pending value just before or just after the transfer starts, the behavior will be totally different; in general, it is better to avoid writing <code>DP_START</code> if <code>START_PENDING</code> is 1.
:'''START''' This address points to the beginning of the command list from which RDP commands will be fetched by the DMA. After writing this register, the address is latched into the RDP interface, and the <code>START_PENDING</code> bit in <code>DPC_STATUS</code> becomes 1, but no transfer is started. Writing <code>DPC_END</code> will actually initiate the transfer. Selection of the data source (RDRAM or DMEM) is controller by bit 0 of <code>DPC_STATUS</code>. Writing <code>DPC_START</code> while another value is pending (<code>START_PENDING</code> is 1) will update the pending value. Notice though that this is a risky operation because of races: the pending transfer could in fact start at any point, and if you write a new pending value just before or just after the transfer starts, the behavior will be totally different; in general, it is better to avoid writing <code>DPC_START</code> if <code>START_PENDING</code> is 1.


====<span style="display:none;">0x0410 0004 (c9) - DP_END ====
====<span style="display:none;">0x0410 0004 (c9) - DPC_END ====
----{{#invoke:Register table|head|800px|DP_END <code>0x0410 0004</code> (<code>c9</code>)}}
----{{#invoke:Register table|head|800px|DPC_END <code>0x0410 0004</code> (<code>c9</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 148: Line 148:
{{#invoke:Register table|foot}}
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
{{#invoke:Register table|definitions
| 23-0 | END[23:0] | Physical address of the end of the primitive buffer (in RDRAM or DMEM). When reading, it always returns the last written value.
| 23-0 | END[23:0] | Physical address of the end of the command list (in RDRAM or DMEM). When reading, it always returns the last written value.
}}
}}
'''Extra Details:'''
'''Extra Details:'''
:'''END''' This address points to the end of the primitive buffer. The address is interpreted as an exclusive bound, so it must point ''after'' the last primitive to transfer. Notice that writing <code>DP_START</code>=<code>DP_END</code> is well formed, and will run a perfectly valid zero byte transfer (which can later extended via an incremental transfer).
:'''END''' This address points to the end of the command list. The address is interpreted as an exclusive bound, so it must point ''after'' the last command to transfer. Notice that writing <code>DPC_START</code>=<code>DPC_END</code> is well formed, and will run a perfectly valid zero byte transfer (which can later extended via an incremental transfer).


When <code>DP_END</code> is written, the RDP does the following:
When <code>DPC_END</code> is written, the RDP does the following:
* if <code>START_PENDING</code> (in <code>DP_STATUS</code>) is 0, the write is considered an "incremental transfer", so the RDP DMA is programmed to continue the last transfer up to the new value of <code>DP_END</code>. This works whether the previous transfer is still running or was already finished; in both cases, the transfer is continued/restored until the new <code>DP_END</code> is reached;
* If <code>START_PENDING</code> (in <code>DPC_STATUS</code>) is 0, the write is considered an "incremental transfer", so the RDP DMA is programmed to continue the last transfer up to the new value of <code>DPC_END</code>. This works whether the previous transfer is still running or was already finished; in both cases, the transfer is continued/restored until the new <code>DPC_END</code> is reached;
* if <code>START_PENDING</code> (in <code>DP_STATUS</code>) is 1, the behavior depends on whether a transfer is running or not:
* If <code>START_PENDING</code> (in <code>DPC_STATUS</code>) is 1, the behavior depends on whether a transfer is running or not:
** if no transfer is running, the new transfer is started (from <code>DP_START</code> to <code>DP_END</code>), and <code>START_PENDING</code> goes back to 0.
** if no transfer is running, the new transfer is started (from <code>DPC_START</code> to <code>DPC_END</code>), and <code>START_PENDING</code> goes back to 0.
**if a transfer is in progress, <code>END_PENDING</code> is set to 1 and the new transfer remains pending and will start as soon as the current transfer is finished. Further writes to <code>DP_END</code> in this state will simply update the pending transfer's end address.
**if a transfer is in progress, <code>END_PENDING</code> is set to 1 and the new transfer remains pending and will start as soon as the current transfer is finished. Further writes to <code>DPC_END</code> in this state will simply update the pending transfer's end address.
'''WARNING''': do not start a DMA transfer or even process any primitive if you have previously enqueued a <code>SYNC_FULL</code> primitive. There is a RDP hardware bug that makes RDP sometimes crash if any other primitive is processed while <code>SYNC_FULL</code> is run. Thus, when you schedule a <code>SYNC_FULL</code> (usually at the end of the frame), it must be the last scheduled primitive (<code>DP_END</code> must point immediately after it), and you must wait until the RDP has processed it and got back to fully idle status (<code>BUSY</code> bit goes to 0 in <code>DP_STATUS</code>), before starting a new DMA transfer, even just an incremental one. It is fine to just write <code>DP_START</code> though, as that doesn't start a transfer.
'''WARNING''': Do not start a DMA transfer or even process any command if you have previously enqueued a <code>SYNC_FULL</code> command. There is a RDP hardware bug that makes RDP sometimes crash if any other command is processed while <code>SYNC_FULL</code> is run. Thus, when you schedule a <code>SYNC_FULL</code> (usually at the end of the frame), it must be the last scheduled command (<code>DPC_END</code> must point immediately after it), and you must wait until the RDP has processed it and got back to fully idle status (<code>BUSY</code> bit goes to 0 in <code>DPC_STATUS</code>), before starting a new DMA transfer, even just an incremental one. It is fine to just write <code>DPC_START</code> though, as that doesn't start a transfer.


====<span style="display:none;">0x0410 0008 (c10) - DP_CURRENT ====
====<span style="display:none;">0x0410 0008 (c10) - DPC_CURRENT ====
----{{#invoke:Register table|head|800px|DP_CURRENT <code>0x0410 0008</code> (<code>c10</code>)}}
----{{#invoke:Register table|head|800px|DPC_CURRENT <code>0x0410 0008</code> (<code>c10</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 183: Line 183:
}}
}}
'''Extra Details:'''
'''Extra Details:'''
:'''CURRENT''' This address points after the last primitive that was transferred by DMA. It is possible to monitor this register to know how far the transfer has gone. In general, it is expected that <code>DP_START</code> <= <code>DP_CURRENT</code> <= <code>DP_END</code>, and thus the portion of the buffer between <code>DP_START</code> and <code>DP_CURRENT</code> is free to be recycled for other uses. When a transfer is finished, <code>DP_CURRENT</code> will always be equal to <code>DP_END</code>. Notice that when <code>START_PENDING</code> or <code>END_PENDING</code> are 1, reading <code>DP_START</code> and <code>DP_END</code> will return the pending values, while reading <code>DP_CURRENT</code> will always refer to the currently running (or last finished) transfer.
:'''CURRENT''' This address points after the last command that was transferred by DMA. It is possible to monitor this register to know how far the transfer has gone. In general, it is expected that <code>DPC_START</code> <= <code>DPC_CURRENT</code> <= <code>DPC_END</code>, and thus the portion of the buffer between <code>DPC_START</code> and <code>DPC_CURRENT</code> is free to be recycled for other uses. When a transfer is finished, <code>DPC_CURRENT</code> will always be equal to <code>DPC_END</code>. Notice that when <code>START_PENDING</code> or <code>END_PENDING</code> are 1, reading <code>DPC_START</code> and <code>DPC_END</code> will return the pending values, while reading <code>DPC_CURRENT</code> will always refer to the currently running (or last finished) transfer.


====<span style="display:none;">0x0410 000C (c11) - DP_STATUS ====
====<span style="display:none;">0x0410 000C (c11) - DPC_STATUS ====
----{{#invoke:Register table|head|800px|DP_STATUS <code>0x0410 000C</code> (<code>c11</code>) - Read access}}
----{{#invoke:Register table|head|1400px|DPC_STATUS <code>0x0410 000C</code> (<code>c11</code>) - Read access}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 202: Line 202:
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
|-
|-
| READY || BUSY || PIPE_BUSY || TMEM_BUSY || START_GCLK || FLUSH || FREEZE || XBUS
| CBUF_READY || BUSY || PIPE_BUSY || TMEM_BUSY || START_GCLK || FLUSH || FREEZE || XBUS
{{#invoke:Register table|foot}}
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
{{#invoke:Register table|definitions
| 10 | START_PENDING | Set if DP_START was written and the value is still pending because DP_END was not written yet, or another transfer is in progress (see DP_START and DP_END).
| 10 | START_PENDING | Set if DPC_START was written and the value is still pending because DPC_END was not written yet, or another transfer is in progress (see DPC_START and DPC_END).
| 9 | END_PENDING | Set if DP_END was written and the value is still pending because another transfer is in progress (see DP_END)
| 9 | END_PENDING | Set if DPC_END was written and the value is still pending because another transfer is in progress (see DPC_END)
| 8 | DMA_BUSY | ?
| 8 | DMA_BUSY | ?
| 7 | READY | ?
| 7 | CBUF_READY | ?
| 6 | BUSY | Becomes 1 as soon as a DMA transfer starts, and stays to 1 until a <code>SYNC_FULL</code> primitive is run.
| 6 | BUF_BUSY | Becomes 1 as soon as a DMA transfer starts, and stays to 1 until a <code>SYNC_FULL</code> command is run.
| 5 | PIPE_BUSY | ?
| 5 | PIPE_BUSY | ?
| 4 | TMEM_BUSY | ?
| 4 | TMEM_BUSY | ?
| 3 | START_GCLK | ?
| 3 | START_GCLK | ?
| 2 | FLUSH | While set, all RDP transfers in progress or started are immediately terminated
| 2 | FLUSH | While set, all RDP transfers in progress or started are immediately terminated
| 1 | FREEZE | While set, RDP will stop processing primitives
| 1 | FREEZE | While set, RDP will stop processing commands
| 0 | XBUS | 0: DMA transfer source is XBUS; 1: DMA transfer source is DMEM
| 0 | XBUS | 0: DMA transfer source is XBUS; 1: DMA transfer source is DMEM
}}
}}
{{#invoke:Register table|head|1200px|DPC_STATUS <code>0x0410 000C</code> - Write access}}
{{#invoke:Register table|head|1400px|DPC_STATUS <code>0x0410 000C</code> (<code>c11</code>) - Write access}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 233: Line 233:
| W-? || W-? || W-? || W-? || W-? || W-? || W-? || W-?
| W-? || W-? || W-? || W-? || W-? || W-? || W-? || W-?
|-
|-
| CLR_PIPE_BUSY || CLR_TMEM_BUSY || SET_FLUSH || CLR_FLUSH || SET_FREEZE || CLR_FREEZE || SET_SOURCE || CLR_SOURCE
| CLR_PIPE_BUSY || CLR_TMEM_BUSY || SET_FLUSH || CLR_FLUSH || SET_FREEZE || CLR_FREEZE || SET_XBUS || CLR_XBUS
{{#invoke:Register table|foot}}
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
{{#invoke:Register table|definitions
| 9 | CLR_CLOCK | Reset the `DP_CLOCK` to zero.
| 9 | CLR_CLOCK | Reset '''DPC_CLOCK''' to zero.
| 8 | CLR_BUFFER_BUSY | ?
| 8 | CLR_BUFFER_BUSY | Reset '''DPC_BUSY''' to zero.
| 7 | CLR_PIPE_BUSY | ?
| 7 | CLR_PIPE_BUSY | Reset '''DPC_PIPE_BUSY''' to zero.
| 6 | CLR_TMEM_BUSY | ?
| 6 | CLR_TMEM_BUSY | Reset '''DPC_TMEM_BUSY''' to zero.
| 5 | SET_FLUSH | Set the FLUSH bit to 1
| 5 | SET_FLUSH | Set the FLUSH bit to 1
| 4 | CLR_FLUSH | Clear the FLUSH bit to 0
| 4 | CLR_FLUSH | Clear the FLUSH bit to 0
Line 248: Line 248:
}}
}}
'''Extra Details:'''
'''Extra Details:'''
:'''FREEZE''' During freeze, the RDP DMA engine is suspended (paused). If a transfer was ongoing, it is paused and will resume as soon as the freeze bit is reset to 0. During the freeze, it is still possible to write DP_START or DP_END, and the writes will still affect the START_PENDING / END_PENDING bits, but no transfer will be initiated.
:'''FREEZE''' During freeze, the RDP DMA engine is suspended (paused). If a transfer was ongoing, it is paused and will resume as soon as the freeze bit is reset to 0. During the freeze, it is still possible to write DPC_START or DPC_END, and the writes will still affect the START_PENDING / END_PENDING bits, but no transfer will be initiated.
:'''FLUSH''' While FLUSH is set, all DMA transfers are instantly terminated (flushed). Pulsing the FLUSH bit is a good way to force-reset the RDP DMA engine and make sure the RDP is ready to initiate a new transfer.
:'''FLUSH''' While FLUSH is set, all DMA transfers are instantly terminated (flushed). Pulsing the FLUSH bit is a good way to force-reset the RDP DMA engine and make sure the RDP is ready to initiate a new transfer.
:'''BUSY''' This bit seems to refer to a more general state of RDP, which is not related to actually performing any task. When the bit is 0, the RDP is like "turned off". As long as a single primitive is sent to RDP, this bit goes to 1 and will stay there even if after that single primitive, you wait for seconds, far beyond the actual execution time of that primitive. It looks like the RDP is now "turned on". To turn it off again, the only known way is sending a SYNC_FULL. After processing a SYNC_FULL, the BUSY bit goes back to 0 again.
:'''BUSY''' This bit seems to refer to a more general state of RDP, which is not related to actually performing any task. When the bit is 0, the RDP is like "turned off". As long as a single command is sent to RDP, this bit goes to 1 and will stay there even if after that single command, you wait for seconds, far beyond the actual execution time of that command. It looks like the RDP is now "turned on". To turn it off again, the only known way is sending a SYNC_FULL. After processing a SYNC_FULL, the BUSY bit goes back to 0 again.


====<span style="display:none;">0x0410 0010 (c12) - DP_CLOCK ====
====<span style="display:none;">0x0410 0010 (c12) - DPC_CLOCK ====
----{{#invoke:Register table|head|1200px|DP_CLOCK <code>0x0410 0010</code>}}
----{{#invoke:Register table|head|1200px|DPC_CLOCK <code>0x0410 0010</code> (<code>c12</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 275: Line 275:
}}
}}
'''Extra Details:'''
'''Extra Details:'''
:'''CLOCK''' This register accesses a read-only 24-bit clock that runs at the RCP frequency (which is 62.5 Mhz on standard N64, and 96 Mhz on iQue). The counter starts ticking from boot and does not stop (not even if you freeze the RDP via the `FREEZE` bit in `DP_STATUS`). The only possible interaction from the CPU/RSP is to reset it to 0 by writing the `CLR_CLOCK` bit in `DP_STATUS`. Being the only counter available directly from RSP, it can be useful to perform benchmarks on it.
:'''CLOCK''' This register accesses a read-only 24-bit clock that runs at the RCP frequency (which is 62.5 Mhz on standard N64, and 96 Mhz on iQue). The counter starts ticking from boot and does not stop (not even if you freeze the RDP via the `FREEZE` bit in `DPC_STATUS`). The only possible interaction from the CPU/RSP is to reset it to 0 by writing the `CLR_CLOCK` bit in `DPC_STATUS`. Being the only counter available directly from RSP, it can be useful to perform benchmarks on it.


====<span style="display:none;">0x0410 0014 - DPC_BUSY ====
====<span style="display:none;">0x0410 0014 (c13) - DPC_BUF_BUSY ====
----{{#invoke:Register table|head|1200px|DPC_BUSY <code>0x0410 0014</code>}}
----{{#invoke:Register table|head|1200px|DPC_BUF_BUSY <code>0x0410 0014</code> (<code>c13</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 286: Line 286:
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
|-
|-
| colspan=8|BUSY[23:16]
| colspan=8|BUF_BUSY[23:16]
{{#invoke:Register table|row|15:8}}
{{#invoke:Register table|row|15:8}}
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
|-
|-
| colspan=8|BUSY[15:0]
| colspan=8|BUF_BUSY[15:8]
{{#invoke:Register table|row|7:0}}
{{#invoke:Register table|row|7:0}}
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
| R-? || R-? || R-? || R-? || R-? || R-? || R-? || R-?
|-
|-
| colspan=8|BUSY[7:0]
| colspan=8|BUF_BUSY[7:0]
{{#invoke:Register table|foot}}
{{#invoke:Register table|foot}}
{{#invoke:Register table|definitions
{{#invoke:Register table|definitions
| 23-0 | BUSY[23:0] | ?
| 23-0 | BUF_BUSY[23:0] | ?
}}
}}
====<span style="display:none;">0x0410 0018 - DPC_PIPE_BUSY ====
====<span style="display:none;">0x0410 0018 (c14) - DPC_PIPE_BUSY ====
----{{#invoke:Register table|head|1200px|DPC_PIPE_BUSY <code>0x0410 0018</code>}}
----{{#invoke:Register table|head|1200px|DPC_PIPE_BUSY <code>0x0410 0018</code> (<code>c14</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
Line 321: Line 321:
| 23-0 | PIPE_BUSY[23:0] | ?
| 23-0 | PIPE_BUSY[23:0] | ?
}}
}}
====<span style="display:none;">0x0410 001C - DPC_TMEM_BUSY ====
====<span style="display:none;">0x0410 001C (c15) - DPC_TMEM_BUSY ====
----{{#invoke:Register table|head|1200px|DPC_TMEM_BUSY <code>0x0410 001C</code>}}
----{{#invoke:Register table|head|1200px|DPC_TMEM_BUSY <code>0x0410 001C</code> (<code>c15</code>)}}
{{#invoke:Register table|row|31:24}}
{{#invoke:Register table|row|31:24}}
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?
| U-? || U-? || U-? || U-? || U-? || U-? || U-? || U-?