This event counts the number of retired load instructions that experienced memory order buffer (MOB) replays because store-to-load forwarding restrictions were not observed. A MOB replay may occur for several reasons. This event is programmed to count those MOB replays caused by loads in which store-to-load forwarding is blocked.
Intel
When a store-to-load forwarding restriction is not observed, the memory load operation is stalled.
A task common in work with 32-bit packed RGBA color values can face this coding pitfall. An operation may generate a new red value, store this 8-bit value to a memory location, then read back the entire 32-bit value. Storing a small data value and loading back a larger operand that contains the stored data blocks store-to-load forwarding. This is because part of the data being loaded may reside anywhere in memory system (caches, DRAM).
To avoid blocking in this example, complete the processing of all four color values in a 32-bit integer, then write the full 32 bits to memory. To enable store-to-load forwarding, a dependent load must load data of the same size or smaller than the preceding store, and the starting address of the load and store must be the same.
Improving the memory access patterns of an application to observe store-to-load forwarding restrictions will deliver significant performance improvement on Pentium 4 processors as well as P6 family processors. Using the Intel Compiler (version 5.0 or later) or the Microsoft* Visual C++* compiler (version 7.0 or later) will eliminate many cases of store-to-load forwarding violations.
For more information, see the latest processor optimization reference manual.