Reality Signal Processor/CPU Core: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
Line 609: Line 609:
|}
|}


===== '''Assembly''' =====
===== Assembly =====
<syntaxhighlight lang="asm">
<syntaxhighlight lang="asm">
lsv $v01,e(2), 0,s0 ; Load the 16-bit word at s0 into the third lane of $v01
lsv $v01,e(2), 0,s0 ; Load the 16-bit word at s0 into the third lane of $v01
Line 624: Line 624:
</syntaxhighlight>
</syntaxhighlight>


===== Description =====
===== '''Description''' =====
These instructions load a scalar value (1, 2, 4, or 8 bytes) from DMEM into a VPR. Loads affect only a portion of the vector register (which is 128-bit); other bytes in the register are not modified.
These instructions load a scalar value (1, 2, 4, or 8 bytes) from DMEM into a VPR. Loads affect only a portion of the vector register (which is 128-bit); other bytes in the register are not modified.


Line 631: Line 631:
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, fewer bytes are processed (eg: <code>llv</code> with <code>element=13</code> only loads 3 byte from memory into <code>VPR[vt][13..15]</code>).
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, fewer bytes are processed (eg: <code>llv</code> with <code>element=13</code> only loads 3 byte from memory into <code>VPR[vt][13..15]</code>).


===== '''Usage''' =====
===== Usage =====
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.