Reality Signal Processor/CPU Core: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
Line 608: Line 608:
|<code>offset</code>
|<code>offset</code>
|}
|}

'''Assembly:'''<syntaxhighlight lang="asm">
===== '''Assembly''' =====
<syntaxhighlight lang="asm">
lsv $v01,e(2), 0,s0 ; Load the 16-bit word at s0 into the third lane of $v01
lsv $v01,e(2), 0,s0 ; Load the 16-bit word at s0 into the third lane of $v01
lbv $v04,8, 0,s1 ; Load the 8-bit word at s1 into the 9th byte of $v04 (MSB of lane 4)
lbv $v04,8, 0,s1 ; Load the 8-bit word at s1 into the 9th byte of $v04 (MSB of lane 4)
Line 629: Line 631:
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, fewer bytes are processed (eg: <code>llv</code> with <code>element=13</code> only loads 3 byte from memory into <code>VPR[vt][13..15]</code>).
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, fewer bytes are processed (eg: <code>llv</code> with <code>element=13</code> only loads 3 byte from memory into <code>VPR[vt][13..15]</code>).


'''Usage'''
===== '''Usage''' =====

These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.


Line 661: Line 662:
|<code>offset</code>
|<code>offset</code>
|}
|}

'''Assembly:'''<syntaxhighlight lang="asm">
===== '''Assembly''' =====
<syntaxhighlight lang="asm">
ssv $v01,e(2), 0,s0 ; Store the 16-bit word in the third lane of $v01 into DMEM at address s0
ssv $v01,e(2), 0,s0 ; Store the 16-bit word in the third lane of $v01 into DMEM at address s0
sbv $v04,8, 0,s1 ; Store the 8-bit word in the 9th byte of $v04 (MSB of lane 4) into DMEM at address s1
sbv $v04,8, 0,s1 ; Store the 8-bit word in the 9th byte of $v04 (MSB of lane 4) into DMEM at address s1
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>slv</code> or <code>sdv</code> (4 or 8 bytes), it will overflow into the following lanes.
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>slv</code> or <code>sdv</code> (4 or 8 bytes), it will overflow into the following lanes.


'''Pseudo-code:'''<syntaxhighlight lang="c">
===== '''Pseudo-code''' =====
<syntaxhighlight lang="c">
addr = GPR[base] + offset * access_size
addr = GPR[base] + offset * access_size
data = VPR[vt][element..element+access_size-1]
data = VPR[vt][element..element+access_size-1]
Line 680: Line 684:
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, the element access wraps within the vector and a full-size store is always performed (eg: <code>slv</code> with <code>element=15</code> stores <code>VPR[vt][15,0..2]</code> into memory, for a total of 4 bytes).
The part of the vector register being accessed is <code>VPR[vt][element..element+access_size]</code>, that is <code>element</code> selects the first accessed byte within the vector register. When <code>element+access_size</code> is bigger than 15, the element access wraps within the vector and a full-size store is always performed (eg: <code>slv</code> with <code>element=15</code> stores <code>VPR[vt][15,0..2]</code> into memory, for a total of 4 bytes).


'''Usage'''
===== '''Usage''' =====

These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Data flow between RSP and VR4300 should be structured in vectorized format, so that it is possible to use a vector store (<code>sqv</code>, in case the output is made of 16-bit data) or a packed load (<code>suv</code>/<code>spv</code>, in case the output is made of 8-bit data). Consider also using <code>mfc2</code> to store a 16-bit value from the lane of a VPR into a GPR.
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Data flow between RSP and VR4300 should be structured in vectorized format, so that it is possible to use a vector store (<code>sqv</code>, in case the output is made of 16-bit data) or a packed load (<code>suv</code>/<code>spv</code>, in case the output is made of 8-bit data). Consider also using <code>mfc2</code> to store a 16-bit value from the lane of a VPR into a GPR.


See [[Reality Signal Processor/CPU Core#Scalar loads: LBV, LSV, LLV, LDV|scalar loads]] for an example of a use-case (reversing a vector) that can be implemented also via <code>ssv</code>.
See