Reality Signal Processor/CPU Core: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
No edit summary
No edit summary
Line 688: Line 688:


See [[Reality Signal Processor/CPU Core#Scalar loads: LBV, LSV, LLV, LDV|scalar loads]] for an example of a use-case (reversing a vector) that can be implemented also via <code>ssv</code>.
See [[Reality Signal Processor/CPU Core#Scalar loads: LBV, LSV, LLV, LDV|scalar loads]] for an example of a use-case (reversing a vector) that can be implemented also via <code>ssv</code>.

==== 128-bit vector loads: LQV, LRV ====
{| class="wikitable"
!31..26
!25..21
!20..16
!15..11
!10..7
!6..0
|-
|<code>LWC2</code>
|<code>base</code>
|<code>vt</code>
|<code>opcode</code>
|<code>element</code>
|<code>offset</code>
|}
{| class="wikitable"
!Insn
!<code>opcode</code>
!Desc
|-
|<code>LQV</code>
|0x04
|load (up to) 16 bytes into vector, left-aligned
|-
|<code>LRV</code>
|0x05
|load (up to) 16 bytes into vector, right-aligned
|}

==== Assembly ====
<syntaxhighlight lang="asm">
// Standard 128-bit load from DMEM aligned address s0 into $v08
lqv $v08, 0,s0

// Loading a misaligned 128-bit vector from DMEM
// (a0 is 128-bit aligned in this example)
lqv $v00, 0x08,a0 // read bytes 0x08(a0) - 0x0F(a0) into left part of the vector (VPR[vt][0..7])
lrv $v00, 0x18,a0 // read bytes 0x10(a0) - 0x17(a0) into right part of the vector (VPR[vt][8..15])

// Advanced example using the "element" field
lqv $v08,e(2), 0x08,a0 // read bytes 0x08(a0) - 0x0F(a0) into VPR[vt][4..11]
lrv $v08,e(2), 0x18,a0 // read bytes 0x10(a0) - 0x13(a0) into VPR[vt][12..15]


</syntaxhighlight>Notice that the element field is optional (defaults to 0) and is usually not specified because these instructions are meant to affect the whole vector. The element field can be specified using the lane syntax (<code>e(N)</code>) or a raw number which maps to the byte offset inside the vector.

===== Description =====
Roughly, these functions behave like <code>lwl</code> and <code>lwr</code>: combined, they allow to read 128 bits of data into a vector register, irrespective of the alignment.

When the data to be loaded is 128-bit aligned within DMEM, <code>lqv</code> is sufficient to read the whole vector (<code>lrv</code> in this case is redundant because it becomes a no-op).

The actual bytes accessed in DMEM depend on the instruction: for <code>lwv</code>, the bytes are those starting at <code>GPR[base] + (offset * 16)</code>, up to and excluding the next 128-bit aligned byte (<code>a0+0x10</code> in the above example); for <code>lrv</code>, the bytes are those starting at the previous 128-bit aligned byte (<code>a0+0x10</code> in the above example) up to and ''excluding'' <code>GPR[base] + (offset * 16)</code>. Again, this is exactly the same behavior of <code>lwl</code> and <code>lwr</code>, but for 128-bit aligned loads.

<code>element</code> is used as a byte offset within the vector register to specify the first byte affected by the operation; that is, the part of the vector being loaded with the instruction pair is <code>VPR[vt][element..15]</code>. Thus a non-zero element means that fewer bytes are loaded.

===== Usage =====
<code>lqv</code> is the most standard way to fill a full VPR vector register loading its contents from DMEM. Given that it's usually possible to define the layout of data in DMEM, it is advisable to design it so that vectors are always aligned to 128-bit (16 bytes), using the <code>.align 4</code> directory: this allows to read the vector using just <code>lqv</code>, in 1 cycle (though the load has a 3-cycle latency like all instructions that write to a VPR).<syntaxhighlight lang="asm">
.data

.align 4
CONST: .half 3, 2, 7, 0, 0x4000, 0x8000, 0x7F, 0xFFF # Several constants used for an algorithm

.text
lqv $v31, %lo(CONST),r0 # Load the constants

</syntaxhighlight>One example of using <code>lqv</code> and <code>lrv</code> in pair is to perform a fast memcpy from a possible misaligned address to an aligned destination buffer: