Reality Signal Processor/CPU Core: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
Line 66: Line 66:
return accum</code>
return accum</code>
Notice that in unsigned clamping, the saturating threshold is 15-bit, but the saturated value is 16-bit.
Notice that in unsigned clamping, the saturating threshold is 15-bit, but the saturated value is 16-bit.

=== Element field ===
Most VU instructions have a 3-register format with an additional modifier called "element field". For instance (using GNU assembly syntax):<syntaxhighlight lang="asm">
opcode v0, v1, v2,e(7)

</syntaxhighlight><code>e(7)</code> is the "element modifier". Normally (and especially in GNU syntax, which is more orthogonal and uniform), it refers to a specific lane of the third register, which is why it is common to format it without a leading whitespace. In this example, it "selects" lane 7 of register <code>v2</code>. The exact meaning of the element modifier varies for different instruction groups, and also the way it is assembled changes wildly. Pay attention to the description of each instruction group to check what the element modifier means and how it is encoded in the opcode.


=== Broadcast modifier ===
=== Broadcast modifier ===
Some family of instructions in VU (specifically, the computational instructions and the select instructions) allow to apply a "broadcast modifier" to one of the input registers. For instance, in this instruction<syntaxhighlight lang="asm">
One of the most common uses of the element field is the broadcast modifier. This modifier is used by computational instructions and select instructions and allows to "broadcast" (duplicate) one or more lanes to other lanes, just for the purpose of the current opcode. For instance:<syntaxhighlight lang="asm">
vaddc $v01, $v04, e(1)
vaddc $v01, $v04,e(1)
</syntaxhighlight><code>e(1)</code> is the broadcast modifier. Normally, the instruction would add the two registers lane by lane; with the modifier, the second lane (index 1) of <code>$v04</code> is added to all lanes of <code>$v01</code>.
</syntaxhighlight><code>e(1)</code> is the broadcast modifier. Normally, the instruction would add the two registers lane by lane; with the modifier, the second lane (index 1) of <code>$v04</code> is added to all lanes of <code>$v01</code>.


Line 368: Line 374:
|}
|}
Instructions have this general format:
Instructions have this general format:
<code>VINSN vd, vs, vt, e(…)</code>
<code>VINSN vd, vs, vt,e(…)</code>
where <code>e(…)</code> is the [[Reality Signal Processor/CPU Core#Broadcast modifier|broadcast modifier]] (as found in other SIMD architectures), that modifies the access to <code>vt</code> duplicating some lanes and hiding others.
where <code>e(…)</code> is the [[Reality Signal Processor/CPU Core#Broadcast modifier|broadcast modifier]] (as found in other SIMD architectures), that modifies the access to <code>vt</code> duplicating some lanes and hiding others.


Line 533: Line 539:
!20..16
!20..16
!15..11
!15..11
!10..8
!10..7
!7..0
!6..0
|-
|-
|<code>COP2</code>
|<code>COP2</code>
Line 543: Line 549:
|0
|0
|}
|}
These are the standard MIPS opcodes for moving data in/out the coprocessor registers
These are the standard MIPS opcodes for moving data in/out the coprocessor registers.
{| class="wikitable"
{| class="wikitable"
!<code>opcode</code>
!<code>opcode</code>
Line 565: Line 571:
|Copy a GPR into a VU control register
|Copy a GPR into a VU control register
|}
|}
Vector moves follow the same format as standard MIPS coprocessor moves, but use part of the lower 11 bits (which are normally unused) to specify which lane of the VPR is accessed. Notice that <code>vs_elem</code> specifies a byte offset (not a lane index!), so to copy a lane, <code>lane*2</code> must be specified.
Vector moves follow the same format as standard MIPS coprocessor moves, but use part of the lower 11 bits (which are normally unused) to specify the element field, selecting which lane of the VPR is accessed. Notice that, <code>vs_elem</code> in this case is not a broadcast modifier: it specifies a byte offset (not a lane index!), so to copy a lane, <code>lane*2</code> must be specified.


This is an example using GNU syntax:<syntaxhighlight lang="asm">
This is an example using GNU syntax:<syntaxhighlight lang="asm">
mtc2 a1, $v04,e(4)
mtc2 a1, $v04,e(4)
</syntaxhighlight>This example will copy the lower 16 bits of GPR <code>a1</code> into the fifth lane of <code>$v04</code>.
</syntaxhighlight>This example will copy the lower 16 bits of GPR <code>a1</code> into the fifth lane of <code>$v04</code>. This opcode is assembled with <code>vs_elem = 8</code>, as explained above.


<code>mtc2</code> moves the lower 16 bits of the general purpose register <code>rt</code> to the bytes <code>VS[vs_elem+1..vs_elem]</code>.
<code>mtc2</code> moves the lower 16 bits of the general purpose register <code>rt</code> to the bytes <code>VS[vs_elem+1..vs_elem]</code>. If <code>vs_elem</code> is 15, only <code>VS[vs_elem]</code> is written (with <code>rt[15..8]</code>).


<code>mfc2</code> moves the 2 bytes <code>VS[vs_elem+1..vs_elem]</code> to GPR <code>rt</code>, sign extending the 16 bits value to 64 bits.
<code>mfc2</code> moves the 2 bytes <code>VS[vs_elem+1..vs_elem]</code> to GPR <code>rt</code>, sign extending the 16 bits value to 64 bits. If <code>vs_elem</code> is 15, the lower byte is taken from byte 0 of the register (that is, it wraps around).


<code>ctc2</code> moves the lower 16 bits of GPR <code>rt</code> into the control register specified by <code>vs</code>, while <code>cfc2</code> does the reverse, moving the control register specified by <code>vs</code> into GPR <code>rt</code>, sign extending to 64 bits. Note that both <code>ctc2</code> and <code>cfc2</code> ignore the <code>vs_elem</code> field. For these instructions, the control register is specified as follows:
<code>ctc2</code> moves the lower 16 bits of GPR <code>rt</code> into the control register specified by <code>vs</code>, while <code>cfc2</code> does the reverse, moving the control register specified by <code>vs</code> into GPR <code>rt</code>, sign extending to 64 bits. Note that both <code>ctc2</code> and <code>cfc2</code> ignore the <code>vs_elem</code> field. For these instructions, the control register is specified as follows: