Reality Signal Processor/CPU Core: Difference between revisions

m
A few value corrections
m (A few value corrections)
 
(3 intermediate revisions by one other user not shown)
Line 66:
return accum</code>
Notice that in unsigned clamping, the saturating threshold is 15-bit, but the saturated value is 16-bit.
 
=== Element field ===
Most VU instructions have a 3-register format with an additional modifier called "element field". For instance (using GNU assembly syntax):<syntaxhighlight lang="asm">
opcode v0, v1, v2,e(7)
 
</syntaxhighlight><code>e(7)</code> is the "element modifier". Normally (and especially in GNU syntax, which is more orthogonal and uniform), it refers to a specific lane of the third register, which is why it is common to format it without a leading whitespace. In this example, it "selects" lane 7 of register <code>v2</code>. The exact meaning of the element modifier varies for different instruction groups, and also the way it is assembled changes wildly. Pay attention to the description of each instruction group to check what the element modifier means and how it is encoded in the opcode.
 
=== Broadcast modifier ===
SomeOne familyof the most common uses of instructionsthe inelement VUfield (specifically,is the broadcast modifier. This modifier is used by computational instructions and the select instructions) allowand allows to apply a "broadcast modifier" to(duplicate) one or more lanes to other lanes, just for the purpose of the inputcurrent registersopcode. For instance, in this instruction:<syntaxhighlight lang="asm">
vaddc $v01, $v04, e(1)
</syntaxhighlight><code>e(1)</code> is the broadcast modifier. Normally, the instruction would add the two registers lane by lane; with the modifier, the second lane (index 1) of <code>$v04</code> is added to all lanes of <code>$v01</code>.
 
Line 368 ⟶ 374:
|}
Instructions have this general format:
<code>VINSN vd, vs, vt, e(…)</code>
where <code>e(…)</code> is the [[Reality Signal Processor/CPU Core#Broadcast modifier|broadcast modifier]] (as found in other SIMD architectures), that modifies the access to <code>vt</code> duplicating some lanes and hiding others.
 
Line 494 ⟶ 500:
!Description
|-
|0x20
|0x00
|<code>vlt</code>
|Select the lower value between two VPR
|-
|0x21
|0x01
|<code>veq</code>
|Compare two VPR to check if they are equal
|-
|0x22
|0x02
|<code>vne</code>
|Compare two VPR to check if they are different
|-
|0x23
|0x03
|<code>vge</code>
|Select the greater or equal value between two VPR
|-
|0x24
|0x04
|<code>vcl</code>
|Clip a VPR against two bounds (lower 16-bits)
|-
|0x25
|0x05
|<code>vch</code>
|Clip a VPR against two bounds (higher 16-bits)
|-
|0x26
|0x06
|<code>vcr</code>
|Clip a VPR against a pow-2 bound
|-
|0x27
|0x07
|<code>vmrg</code>
|Merge two VPR selecting each lane according to flags
Line 533 ⟶ 539:
!20..16
!15..11
!10..87
!76..0
|-
|<code>COP2</code>
Line 543 ⟶ 549:
|0
|}
These are the standard MIPS opcodes for moving data in/out the coprocessor registers.
{| class="wikitable"
!<code>opcode</code>
Line 565 ⟶ 571:
|Copy a GPR into a VU control register
|}
Vector moves follow the same format as standard MIPS coprocessor moves, but use part of the lower 11 bits (which are normally unused) to specify the element field, selecting which lane of the VPR is accessed. Notice that, <code>vs_elem</code> in this case is not a broadcast modifier: it specifies a byte offset (not a lane index!), so to copy a lane, <code>lane*2</code> must be specified.
 
This is an example using GNU syntax:<syntaxhighlight lang="asm">
mtc2 a1, $v04,e(4)
</syntaxhighlight>This example will copy the lower 16 bits of GPR <code>a1</code> into the fifth lane of <code>$v04</code>. This opcode is assembled with <code>vs_elem = 8</code>, as explained above.
 
<code>mtc2</code> moves the lower 16 bits of the general purpose register <code>rt</code> to the bytes <code>VS[vs_elem+1..vs_elem]</code>. If <code>vs_elem</code> is 15, only <code>VS[vs_elem]</code> is written (with <code>rt[15..8]</code>).
 
<code>mfc2</code> moves the 2 bytes <code>VS[vs_elem+1..vs_elem]</code> to GPR <code>rt</code>, sign extending the 16 bits value to 6432 bits. If <code>vs_elem</code> is 15, the lower byte is taken from byte 0 of the register (that is, it wraps around).
 
<code>ctc2</code> moves the lower 16 bits of GPR <code>rt</code> into the control register specified by <code>vs</code>, while <code>cfc2</code> does the reverse, moving the control register specified by <code>vs</code> into GPR <code>rt</code>, sign extending to 6432 bits. Note that both <code>ctc2</code> and <code>cfc2</code> ignore the <code>vs_elem</code> field. For these instructions, the control register is specified as follows:
{| class="wikitable"
!<code>vs</code>
Line 719 ⟶ 725:
|}
 
===== Assembly =====
<syntaxhighlight lang="asm">
// Standard 128-bit load from DMEM aligned address s0 into $v08
Line 735 ⟶ 741:
 
</syntaxhighlight>Notice that the element field is optional (defaults to 0) and is usually not specified because these instructions are meant to affect the whole vector. The element field can be specified using the lane syntax (<code>e(N)</code>) or a raw number which maps to the byte offset inside the vector.
 
===== Pseudo-code =====
<syntaxhighlight lang="c">
// lqv
addr = GPR[base] + (offset * 16)
end = addr | 15
size = MIN(end-addr, 15-element)
VPR[vt][element..element+size] = DMEM[addr..addr+size]
 
</syntaxhighlight><syntaxhighlight lang="c">
// lrv
end = GPR[base] + (offset * 16)
addr = end & ~16
size = MIN(end-addr, 15-element)
VPR[vt][element..addr+size] = DMEM[addr..addr+size]
 
</syntaxhighlight>
 
===== Description =====
5

edits