Reality Signal Processor/CPU Core: Difference between revisions

Jump to navigation Jump to search
Content added Content deleted
No edit summary
Line 613: Line 613:
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>llv</code> or <code>ldv</code> (4 or 8 bytes), it will overflow into the following lanes.
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>llv</code> or <code>ldv</code> (4 or 8 bytes), it will overflow into the following lanes.


'''Pseudo-code:'''<syntaxhighlight lang="c">
===== '''Pseudo-code''' =====
<syntaxhighlight lang="c">
addr = GPR[base] + offset * access_size
addr = GPR[base] + offset * access_size
data = DMEM[addr..addr+access_size-1]
data = DMEM[addr..addr+access_size-1]
Line 663: Line 664:
ssv $v01,e(2), 0,s0 ; Store the 16-bit word in the third lane of $v01 into DMEM at address s0
ssv $v01,e(2), 0,s0 ; Store the 16-bit word in the third lane of $v01 into DMEM at address s0
sbv $v04,8, 0,s1 ; Store the 8-bit word in the 9th byte of $v04 (MSB of lane 4) into DMEM at address s1
sbv $v04,8, 0,s1 ; Store the 8-bit word in the 9th byte of $v04 (MSB of lane 4) into DMEM at address s1
</syntaxhighlight>Notice that it is possible to specify the lane syntax for the <code>element</code> field to refer to a specific lane, but if the access is made using <code>slv</code> or <code>sdv</code> (4 or 8 bytes), it will overflow into the following lanes.
</syntaxhighlight>'''Pseudo-code:'''<syntaxhighlight lang="c">

'''Pseudo-code:'''<syntaxhighlight lang="c">
addr = GPR[base] + offset * access_size
addr = GPR[base] + offset * access_size
data = VPR[vt][element..element+access_size-1]
data = VPR[vt][element..element+access_size-1]
Line 679: Line 682:
'''Usage'''
'''Usage'''


These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Input data should already be provided in vectorized format by the CPU, so that it is possible to use a vector load (<code>lqv</code>, in case the input is made of 16-bit data) or a packed load (<code>luv</code>/<code>lpv</code>, in case the input is made of 8-bit data). Consider also using <code>mtc2</code> to load a 16-bit value into a lane of a VPR when the value is available in a GPR.
These instructions are seldom used. Normally, it is better to structure RSP code to work across full vectors to maximize parallelism. Data flow between RSP and VR4300 should be structured in vectorized format, so that it is possible to use a vector store (<code>sqv</code>, in case the output is made of 16-bit data) or a packed load (<code>suv</code>/<code>spv</code>, in case the output is made of 8-bit data). Consider also using <code>mfc2</code> to store a 16-bit value from the lane of a VPR into a GPR.


See
A possible use-case for these instructions is to reverse the order of the lanes. For instance, in audio codecs, windowing algorithms often work combining sequences audio samples with other sequences in reverse order. RSP does not have an instruction to reverse the order of the lanes, so in that case it might be necessary to manually reverse the lanes while loading using <code>lsv</code>:<syntaxhighlight lang="asm">
lqv $v00, 0,s0 ; Load 8 16-bit samples from DMEM at address s0
lsv $v01,e(7), 0,s1 ; Load 8 16-bit samples from DMEM at address s1 in reverse order
lsv $v01,e(6), 2,s1
lsv $v01,e(5), 4,s1
lsv $v01,e(4), 6,s1
lsv $v01,e(3), 8,s1
lsv $v01,e(2), 10,s1
lsv $v01,e(1), 12,s1
lsv $v01,e(0), 14,s1
</syntaxhighlight>