Pentium(R) 4 processor topicMOB Load Replays Retired (Blocked Store-to-Load Forwards Retired)

This event counts the number of retired load instructions that experienced memory order buffer (MOB) replays because store-to-load forwarding restrictions were not observed. A MOB replay may occur for several reasons. This event is programmed to count those MOB replays caused by loads in which store-to-load forwarding is blocked.

Intel® Pentium® 4 processors use a store-to-load forwarding technique to enable certain memory load operations (loads from an address whose data has just been modified by a preceding store operation) to complete without waiting for the data to be written to the cache. There are size and alignment restrictions for store-to-load forwarding cases to succeed.

Illustration of store-to-load forwarding restrictions:

When a store-to-load forwarding restriction is not observed, the memory load operation is stalled.

Example

A task common in work with 32-bit packed RGBA color values can face this coding pitfall. An operation may generate a new red value, store this 8-bit value to a memory location, then read back the entire 32-bit value. Storing a small data value and loading back a larger operand that contains the stored data blocks store-to-load forwarding. This is because part of the data being loaded may reside anywhere in memory system (caches, DRAM).

To avoid blocking in this example, complete the processing of all four color values in a 32-bit integer, then write the full 32 bits to memory. To enable store-to-load forwarding, a dependent load must load data of the same size or smaller than the preceding store, and the starting address of the load and store must be the same.

Improving the memory access patterns of an application to observe store-to-load forwarding restrictions will deliver significant performance improvement on Pentium 4 processors as well as P6 family processors. Using the Intel Compiler (version 5.0 or later) or the Microsoft* Visual C++* compiler (version 7.0 or later) will eliminate many cases of store-to-load forwarding violations.

For more information, see the latest processor optimization reference manual.

See Also Additional Events Indicating Coding Pitfalls:

Split Loads Retired

Streaming SIMD Extensions (SSE) Input Assists

x87 Input Assists

x87 Output Assists

64k Aliasing Conflicts