Reality Signal Processor/CPU Core: Difference between revisions

no edit summary
No edit summary
|}
 
=== OpcodesInstructions overview ===
 
==== Loads and stores ====
|<code>VCE</code>
|}
 
=== Instruction details ===
 
==== Scalar loads: LBV, LSV, LLV, LDV ====
{| class="wikitable"
!31..26
!25..21
!20..16
!15..11
!10..7
!6..0
|-
|<code>LWC2</code>
|<code>base</code>
|<code>vt</code>
|<code>opcode</code>
|<code>element</code>
|<code>offset</code>
|}
'''Assembly:'''<syntaxhighlight lang="asm">
lsv $v01,e(2), 0,s0 ; Load the 16-bit word at s0 into the third lane of $v01
lbv $v04,8, 0,s1 ; Load the 8-bit word at s1 into the 9th byte of $v04 (MSB of lane 4)
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>llv</code> or <code>ldv</code> (4 or 8 bytes), it will overflow into the following lanes.
 
'''Pseudo-code:'''<syntaxhighlight lang="c">
addr = GPR[base] + offset * access_size
data = DMEM[addr..addr+access_size-1]
VPR[vt][element..element+access_size-1] = data
 
 
</syntaxhighlight>
 
===== Description =====
These instructions load a scalar value (1, 2, 4, or 8 bytes) from DMEM into a VPR. Loads affect only a portion of the vector register (which is 128-bit); other bytes in the register are not modified.
 
The address in DMEM where the value is fetched is computed as <code>GPR[base] + (offset * access_size)</code>, where <code>access_size</code> is the number of bytes being accessed (eg: 4 for <code>llv</code>). The address can be misaligned: despite how memory accesses usually work on MIPS, these instructions perform unaligned memory accesses.
 
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, fewer bytes are processed (eg: <code>llv</code> with <code>element=13</code> only loads 3 byte from memory into <code>VPR[vt][13..15]</code>).
 
'''Usage'''
 
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.
 
A possible use-case for these instructions is to reverse the order of the lanes. For instance, in audio codecs, windowing algorithms often work combining sequences audio samples with other sequences in reverse order. RSP does not have an instruction to reverse the order of the lanes, so in that case it might be necessary to manually reverse the lanes while loading using <code>lsv</code>:<syntaxhighlight lang="asm">
lqv $v00, 0,s0 ; Load 8 16-bit samples from DMEM at address s0
lsv $v01,e(7), 0,s1 ; Load 8 16-bit samples from DMEM at address s1 in reverse order
lsv $v01,e(6), 2,s1
lsv $v01,e(5), 4,s1
lsv $v01,e(4), 6,s1
lsv $v01,e(3), 8,s1
lsv $v01,e(2), 10,s1
lsv $v01,e(1), 12,s1
lsv $v01,e(0), 14,s1
</syntaxhighlight>
 
==== Scalar stores: SBV, SSV, SLV, SDV ====
{| class="wikitable"
!31..26
!25..21
!20..16
!15..11
!10..7
!6..0
|-
|<code>SWC2</code>
|<code>base</code>
|<code>vt</code>
|<code>opcode</code>
|<code>element</code>
|<code>offset</code>
|}
'''Assembly:'''<syntaxhighlight lang="asm">
ssv $v01,e(2), 0,s0 ; Store the 16-bit word in the third lane of $v01 into DMEM at address s0
sbv $v04,8, 0,s1 ; Store the 8-bit word in the 9th byte of $v04 (MSB of lane 4) into DMEM at address s1
</syntaxhighlight>'''Pseudo-code:'''<syntaxhighlight lang="c">
addr = GPR[base] + offset * access_size
data = VPR[vt][element..element+access_size-1]
DMEM[addr..addr+access_size-1] = data
 
</syntaxhighlight>
 
===== Description =====
These instructions store a scalar value (1, 2, 4, or 8 bytes) from a VPR into DMEM.
 
The address in DMEM where the value will be stored is computed as <code>GPR[base] + (offset * access_size)</code>, where <code>access_size</code> is the number of bytes being accessed (eg: 4 for <code>SLV</code>). The address can be misaligned: despite how memory accesses usually work on MIPS, these instructions perform unaligned memory accesses.
 
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, the element access wraps within the vector and a full-size store is always performed (eg: <code>slv</code> with <code>element=15</code> stores <code>VPR[vt][15,0..2]</code> into memory, for a total of 4 bytes).
 
'''Usage'''
 
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.
 
A possible use-case for these instructions is to reverse the order of the lanes. For instance, in audio codecs, windowing algorithms often work combining sequences audio samples with other sequences in reverse order. RSP does not have an instruction to reverse the order of the lanes, so in that case it might be necessary to manually reverse the lanes while loading using <code>lsv</code>:<syntaxhighlight lang="asm">
lqv $v00, 0,s0 ; Load 8 16-bit samples from DMEM at address s0
lsv $v01,e(7), 0,s1 ; Load 8 16-bit samples from DMEM at address s1 in reverse order
lsv $v01,e(6), 2,s1
lsv $v01,e(5), 4,s1
lsv $v01,e(4), 6,s1
lsv $v01,e(3), 8,s1
lsv $v01,e(2), 10,s1
lsv $v01,e(1), 12,s1
lsv $v01,e(0), 14,s1
</syntaxhighlight>