Pentium(R) 4 processor topicSplit Loads Retired

This event counts the number of retired instructions that caused split loads. A split load occurs when a data value is read, and part of the data is located in one cache line and part in another. Split loads reduce performance because they force the processor to read two cache lines separately and then paste the two parts of data back together. Reading data from two cache lines is several times slower than reading data from a single cache line even if the data is not otherwise properly aligned.

Generally, the compiler aligns data to avoid placing values across cache-line boundaries. However, if you cast a C or C++ pointer (for example, when using SIMD intrinsics) to one of a larger data size or otherwise manipulate a pointer's address, your chance of crossing a boundary increases. You can reduce the likelihood of splits by using the __declspec(align(n)) attribute to align arrays or structures of small data values that will be later accessed as larger values via casting. See your compiler documentation for information on using this attribute.

Note

On the Pentium® 4 processor, each first-level data-cache line contains 64 bytes of data, so the address of the data at the beginning of each line is a multiple of 64.

See also additional events that indicate coding pitfalls:

MOB Load Replays Retired (Blocked Store-to-Load Forwards Retired)

Streaming SIMD Extensions (SSE) Input Assists

x87 Input Assists

x87 Output Assists

64k Aliasing Conflicts