Partial_Reg is a warning that is indicated during static analysis. During static analysis, the VTuneTM Performance Analyzer detects partial register stalls within a basic block. It also detects partial register stalls between basic blocks, when it is caused by a write operation to a partial register in one block that is followed by a jump to a block that contains a read operation from its large register.
The instruction for which Partial_Reg is issued reads from a large register (for example, EAX) after some previous instruction wrote to one of its partial registers (for example, AL, AH, AX). The read stalls until the write retires, even if the instructions are not adjacent.
This applies to all register pairs involving either a larger register with any of its partial registers, or two partial registers in the same set.
Examples of larger registers with one of its partial registers are: AX with EAX, BL with BX, and SI with ESI.
Examples of two partial registers in the same set are: AL with AH, and CL with CH.
The stall does not occur if the write has already retired when the read begins executing. (In static simulation, there is no way to know exactly when the write retires. Therefore, a fixed distance between the write and the read is used for simulation purposes only. The fixed distance used is usually long enough to enable the write to retire.)
A partial register stall also occurs in the following cases because the processor operates on 32 bits internally (even though it seems to be operating on only 16 bits):
A MOV instruction writes to any partial register, and subsequently, a MOVSX or MOVZX instruction reads from the same partial register. Example:
mov ax, 7 movsx ebx, ax
A MOV instruction writes to any partial register, and subsequently, the contents of the partial register are copied to any segment register. Example:
mov ax, 7 mov ss, ax
Advice
Try to avoid using partial registers. If you must use partial registers, you can still prevent penalties as follows:
Try to avoid using a large register after writing to one of its partial registers.
Use an XOR or SUB instruction to clear the upper bits of a large register before writing to one of its partial registers. When the upper bits of the larger register are cleared in this way, reading it after writing one of its partial registers does not cause a stall. Other ways of clearing the upper bits of the large register do not prevent a stall.
Mispredicted branches cancel the effect of the XOR and SUB instructions. Therefore, to use an XOR or SUB instruction to prevent this stall, position the XOR or SUB after the branch.
Example: Avoiding the Use of Partial Registers
Original |
Optimized |
|
|
Here, the second MOV instruction writes to just the lower portion of the EAX register, AL. The third MOV instruction reads the whole AX register. This causes a partial stall.
|
Here, the full ECX register is copied into EAX. The
The code could be further optimized by rescheduling. |
Example: Using XOR to Prevent Partial Register Stalls
The XOR and SUB instructions can be used to clear the upper bits of a large register before writing to one of its partial registers. When the upper bits of the larger register are cleared in this way, reading it after writing to one of its partial registers does not cause a stall. Other ways of clearing the upper bits of the large register do not prevent a stall.
Original |
Optimized |
|
|
The INC instruction uses the entire EAX register. The preceding MOV instruction used just the lower portion of the EAX register, AL. This causes a partial stall. |
Using the XOR instruction before reading the partial register clears all bits in EAX to 0, and prevents the stall. |
Example: Using SUB to Prevent Partial Register Stalls
The SUB and XOR instructions can be used to clear the upper bits of a large register before writing to one of its partial registers. When the upper bits of the larger register are cleared in this way, reading it after writing to one of its partial registers does not cause a stall. Other ways of clearing the upper bits of the large register do not prevent a stall.
Original |
Optimized |
|
|
The MOV in the first line clears EAX. The MOV in the second line writes to the partial register AL. The third line increments EAX, causing a stall. |
Using the SUB instruction before reading the partial register clears all bits in EAX to 0, and prevents the stall. |