COP1: Difference between revisions

← Older edit

COP1 (view source)

Revision as of 19:55, 17 August 2023

6,069 bytes added , 8 months ago

→‎Full Mode vs Half Mode

VisualWikitext

Lemmy

14

edits

Revision as of 07:15, 1 November 2022 (view source) Lemmy (talk \| contribs) (First draft - not linking this from other pages yet as it's not yet done)		Latest revision as of 19:55, 17 August 2023 (view source) Lemmy (talk \| contribs) (→‎Full Mode vs Half Mode)
(13 intermediate revisions by 2 users not shown)
Line 16: \| Single \|\| S \|\| 32 bit float \|- \| Double \|\| D \|\| 64 bit ~~double~~float \|- \| Word \|\| W \|\| 32 bit integer Line 49: == Rounding modes and inexact results == ~~The~~Most of conversions mentioned above, but also most regular instructions can be lossy. When that happens, the COP1 has to perform some sort of rounding to fit the result in the destination. It provides four modes: * ROUND: Round towards nearest number (e.g. 4.4 => 4 and 4.6 => 5), * TRUNC: Round towards zero (e.g. 4.9 => 4 and -4.9 => 4), Line 55: * FLOOR: Round towards smaller number (e.g. 4.9 => 4 and -4.9 => -5). The COP1 has a configurable rounding mode in FCSR (see below), which is applied for most instructions where ~~its~~it's applicable. For the specific case of float->int conversions, it provides specialized instructions that overwrite the global rounding mode: ROUND.x.y, TRUNC.x.y, CEIL.x.y, FLOOR.x.y (where x is either W or L and y is either S or D; all 16 combinations are supported). When rounding happens, inexact is signaled (see exceptions below). == FCSR == In addition to the data registers, the COP1 also provides the Floating Point Status Register, is read via CFC1 and written through CTC1 (~~both~~ using index 31). It provides the following bits: {\| class="wikitable" Line 109: The COP1 supports 6 exceptions: * Inexact: The destination can't hold the full result, so some data loss occurred and rounding was performed. * Underflow: The resulting number was so small it was rounded ~~down~~ to 0. This is always in combination with inexact. (The COP1 has a quirk here: Unlike other CPUs ~~like~~(e.g. x64 ~~and~~or arm64), the rounding modes FLOOR/CEIL are taken literally even on underflow; if the result is smaller than the smallest possible float, it might not be rounded to 0 but to the minimum regular float). * Overflow: The resulting number was so large it couldn't be represented as a regular number and was instead "rounded up to infinity" (which is a special floating point value). This is always in combination with inexact. * Division By Zero: This just happens for DIV.S and DIV.D when the divisor is 0. * Invalid Operation: This happens in a bunch of special cases (see "~~special~~Special ~~cases~~Cases" below). * Unimplemented Operation: This happens in a bunch of special cases (see "~~special~~Special ~~cases~~Cases" below). ~~For instructions~~Instructions that can fire exceptions (e.g. ADD.S, CVT.S.W) ~~the~~always ~~process~~clear isall ~~roughly~~Cause asbits ~~follows~~that aren't being signaled (in this specific instructions. For example, ~~Inexact~~CVT.W.S isfrom ~~being~~5.5 ~~signaled)~~to an int would affect the bits in the following way: * Clear all Cause bits * Perform operation * Set ~~Inexact~~ "Cause: Inexact" * If "Enable: Inexact" is ~~Enabled~~true, fire exception. Otherwise, set "Flag: Inexact" ~~Flag~~and put result value into destination register. This means that ~~after~~Cause ~~several~~be ~~instructions,~~looked ~~Flags~~at ~~are~~to ~~cumulative:~~see ~~They~~the ~~are~~result ~~true~~of ifthe ~~any~~directly ~~previous~~preceding instruction. ~~signaled~~Flags ~~that~~however ~~exception~~are ~~(assuming~~cumulative: itThey ~~was~~are ~~disabled).~~true ~~Cause~~if onany instruction since the ~~other~~last ~~hand~~clear ~~exclusively~~signaled ~~has~~that ~~information~~exception, onassuming the ~~preceding~~exception ~~instruction~~was disabled. Unimplemented Operation is special as it can't be disabled - if it happens, it will always fire. == Floating Point Numbers == ~~Before~~At ~~going~~this ~~into details about Invalid Operation and Unimplemented Operation~~point, it makes sense to take a quick look at what floats actually are. ~~This~~The following is the ~~definition~~bit representation of a single (doubles work exactly the same, but have more bits in the exponent and the mantissa): {\| class="wikitable" Line 133: ! 31 !! 30 - 23 !! 22 - 0 \|- \| Sign (1 bit) \|\| Exponent (8 bits) \|\| Mantissa (23 bits) \|} Line 139: There are some special cases for the exponent and mantissa: {\| class="wikitable" * Exponent=0 and Mantissa=0: The number is 0.0 (if sign is 0) or -0.0 (negative zero if sign is 1). Note that for all intents and purposes, -0 is considered equal to 0. \|+ Special Numbers * Exponent=0 and Mantissa!=0: The number is a denormal or subnormal. If a number like this is given to an calculating instruction, an Unimplemented Operation is signaled. \|- * Exponent=0xFF and Mantissa==0: The number is INFINITY (if sign is 0) or -INFINITY (if sign is 1). ! Sign bit !! Exponent !! Mantissa !! Description * Exponent=0xFF and Mantissa!=0: The number is a NAN (Not a Number), which indicates an incorrect result (this is for example the result of 0.0/0.0 or sqrt(-2). There are two varieties of NAN, which are differented by the most significant bit of the mantissa: msb=1 is qNAN (quiet) and msb=0 is sNAN (signaling). Any sNAN that is given to a calculating instruction will immediately trigger an UnimplementedOperation. qNAN as input will cause the output of the instruction to be qNAN (though with a different payload) and no exception will be signaled. \|- * Anything else: This is a regular floating point number. \| 0 \|\| 0 \|\| 0 \|\| Regular zero \|- \| 1 \|\| 0 \|\| 0 \|\| "Negative zero", which is considered equal to regular zero \|- \| any \|\| 0 \|\| != 0 \|\| Denormal/subnormal \|- \| 0 \|\| 0xFF \|\| 0 \|\| Positive Infinity \|- \| 1 \|\| 0xFF \|\| 0 \|\| Negative Infinity \|- \| any \|\| 0xFF \|\| != 0 with highest bit 0 \|\| sNAN (signaling Not-A-Number) \|- \| any \|\| 0xFF \|\| != 0 with highest bit 1 \|\| qNAN (quiet Not-A-Number) \|- \| 0 \|\| 0<x<0xFF \|\| any \|\| A regular positive number \|- \| 1 \|\| 0<x<0xFF \|\| any \|\| A regular negative number \|} Knowing this, determining special cases and certain operations can be done through simple bit operations: <code><pre> fn is_zero(f: u32) -> bool { f & 0x7FFF_FFFF == 0 } fn is_subnormal(f: u32) -> bool { ((f & 0x7F80_0000) == 0) && ((f & 0x7F_FFFF) != 0) } fn is_nan(f: u32) -> bool { ((f & 0x7F80_0000) == 0x7F80_0000) && ((f & 0x7F_FFFF) != 0) } fn is_quiet_nan(f: u32) -> bool { (f & 0x7FC0_0000) == 0x7FC0_0000) } fn absolute_value(f: u32) -> bool { f & 0x7FFF_FFFF } fn negate(f: u32) -> bool { f ^ 0x8000_0000 } </pre></code> == Special Cases == The COP1 will never on its own produce either a subnormal or a qNAN. The following rules applies to calculating instructions (ADD.X, SUB.X, DIV.X, MUL.X, SQRT.X, ABS, NEG.X, CVT.Y.X, ROUND.Y.X, TRUNC.Y.X, FLOOR.Y.X, CEIL.Y.X, where X is S/D): * If an input is sNAN, fire Unimplemented Operation * If an input is subnormal, fire Unimplemented Operation * If an input is qNAN, signal Invalid Operation and set result to sNAN (specifically 0x7FBFFFFF (for floats) or 0x7FF7FFFFFFFFFFFF (for doubles)) (exceptions: CVT.W.x and CVT.L.x also fire Unimplemented Operation as NAN can not be represented as an integer) * Perform the operation * If the operation underflowed, the following happens: If "Flush Denorm To Zero" is 1 AND "Enable: Underflow" is 0 AND "Enable: Inexact" is 0, the result is flushed and Underflow and Inexact are signaled. In most cases this means that it is set to 0 or "negative 0". (Two exception: If the rounding mode is "Ceil" and the result is positive, it will be set to the smallest positive value instead; similarly, "Floor" will set a negative result to the negative value that is closest to 0). Otherwise, fire Unimplemented Operations * If the operation is invalid (for example: Infinity-Infinity, 0.0 / 0.0 or SQRT(-2)), set result to sNAN (specifically 0x7FBFFFFF (for floats) or 0x7FF7FFFFFFFFFFFF (for doubles)) and signal Invalid Operation is signaled. * If the operation was a division by zero, signal division by zero * If the operation overflowed, signal Overflow and Inexact and set result to Infinite or -Infinity * If the operation was inexact, signal inexact MOV.S and MOV.D are special: They just copy the bits and never fire or signal exceptions. They also don't clear the Cause bits. == Comparisons == The COP1 in total has 16 single compare instructions with some pretty confusing names (and another 16 for doubles). The 16 instructions are all possible combinations of the following 4 bits: * Unordered (Bit 0): Comparison is considered true if one or both of the operands is NAN * Equal (Bit 1): Comparison is considered true if both operands are equal (note that 0 is equal to -0, but NAN is always different from another NAN) * Smaller (Bit 2): Comparison is considered true if the first operand is smaller than the second * SignalOnSNAN (Bit 3): If either operand is sNAN, this will signal Invalid Operation. If multiple bits are set, the conditions are ORed together: For example, UEQ is considered true if the two operands are equal or unordered. Note that inputs of qNAN always signal Invalid Operation. Using all bit combinations, this gives the following instructions: {\| class="wikitable" \|+ Compare encoding \|- ! SignalOnSNAN (Bit 3) !! Smaller (Bit 2) \|\| Equal (Bit 1) \|\| Unordered (Bit 0) \|\| Name \|\| Result formula \|\| Invalid Operation Condition \|- \| 0 \|\| 0 \|\| 0 \|\| 0 \|\| F \|\| Result = false \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 0 \|\| 0 \|\| 1 \|\| UN \|\| Result = unordered(arg1, arg2) \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 0 \|\| 1 \|\| 0 \|\| EQ \|\| Result = arg1 == arg2 \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 0 \|\| 1 \|\| 1 \|\| UEQ \|\| Result = unordered(arg1, arg2) OR (arg1 == arg2) \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 1 \|\| 0 \|\| 0 \|\| OLT \|\| Result = arg1 < arg2 \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 1 \|\| 0 \|\| 1 \|\| ULT \|\| Result = unordered(arg1, arg2) OR (arg1 < arg2) \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 1 \|\| 1 \|\| 0 \|\| OLE \|\| Result = arg1 <= arg2 \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 0 \|\| 1 \|\| 1 \|\| 1 \|\| ULE \|\| Result = unordered(arg1, arg2) OR (arg1 <= arg2) \|\| IsQNAN(arg1) OR isQNAN(arg2) \|- \| 1 \|\| 0 \|\| 0 \|\| 0 \|\| SF \|\| Result = false \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 0 \|\| 0 \|\| 1 \|\| NGLE \|\| Result = unordered(arg1, arg2) \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 0 \|\| 1 \|\| 0 \|\| SEQ \|\| Result = arg1 == arg2 \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 0 \|\| 1 \|\| 1 \|\| NGL \|\| Result = unordered(arg1, arg2) OR (arg1 == arg2) \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 1 \|\| 0 \|\| 0 \|\| LT \|\| Result = arg1 < arg2 \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 1 \|\| 0 \|\| 1 \|\| NGE \|\| Result = unordered(arg1, arg2) OR (arg1 < arg2) \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 1 \|\| 1 \|\| 0 \|\| LE \|\| Result = arg1 <= arg2 \|\| IsNAN(arg1) OR isNAN(arg2) \|- \| 1 \|\| 1 \|\| 1 \|\| 1 \|\| NGT \|\| Result = unordered(arg1, arg2) OR (arg1 <= arg2) \|\| IsNAN(arg1) OR isNAN(arg2) \|} == Full Mode vs Half Mode == The COP1 can run in one of two modes, which is controlled via COP0.Status Bit 26. In "Full Mode", the COP1 has 32 bit registers that are each 64 bits wide are available. In "Half Mode", only the 16 even registers are legal to be used; using odd numbered registers is considered undefined behavior. Older software usually ran in "Half Mode". A reason for that could be that context switches (for multithreading) can be performed more cheaply, as only 16 FPU registers need to be stored. When using "Half Mode" it is important to not use any FPU registers with odd indices. For compiled code this has to be configured accordingly ("+nooddspreg" in clang). ''The remainder of this section documents undefined behavior. Skip this unless you are an emulator developer who cares about accuracy a little bit too much.'' If software decides to use odd indices in "Half Mode", different things happen, depending on the instruction: {\| class="wikitable" \|+ Illegal register indexing in "Half Mode" (normally undocumented behavior - do not use) \|- ! Actual Register Index \|\| MFC1/MTC1/LWC1/LDC1 \|\| fd (32 bit), ft (32 bit) or any 64 bit \|\| fs (32 bit) \|- \| 0 \|\| 1 (high 32 bits) / 0 (low 32 bits) \|\| 0 (low 32 bits) \|\| 0 or 1 \|- \| 1 \|\| unused \|\| 1 (low 32 bits) \|\| unused \|- \| 2 \|\| 3 (high 32 bits) / 2 (low 32 bits) \|\| 2 (low 32 bits) \|\| 2 or 3 \|- \| 3 \|\| unused \|\| 3 (low 32 bits) \|\| unused \|- \| 4 \|\| 5 (high 32 bits) / 4 (low 32 bits) \|\| 4 (low 32 bits) \|\| 4 or 5 \|- \| 5 \|\| unused \|\| 5 (low 32 bits) \|\| unused \|}