Write Combining Buffer (WCB) Full EvictionsThread Specificity: TI
This event counts when all Write Combining Buffers (WCBs) are occupied and an entry must be evicted to handle a new request.
Counting WCB Full Evictions can provide the following tuning insight:
Such evictions are distinguished from evictions
due to aliasing conflicts. Subtracting WCB Full Evictions from the WCB
All Evictions event provides an indication of the more expensive form
of 64k aliasing. 64k aliasing can either occur between loads, leading
to conflicts in the 1st-level cache that incur delays to get data from
the 2nd-level cache, or they can occur between stores and other memory
references. The latter case causes WCB thrashing, and is indicated by
the result of subtracting WCB full evictions from all WCB evictions. If
the ratio of the resulting count to the number of retired instructions
is high, avoiding memory references that are a multiple of 64 KB apart
may boost performance.
Note that this count is still only an indication of a possible problem;
the hardware does not permit a definitive count..
For a sequence of stores, this event can sometimes
be used as an indication of how efficiently the WCBs are being used. WCBs
combine data from stores to a set of contiguous addresses (such as those
in the same cache line). But if an inner loop attempts to interleave stores
from more write streams than the number of WCBs then the WCBs will be
thrashed, and WCBs will have to be evicted before they are filled. In
writeback memory, this simply leads to slight delays as WCBs are deallocated
and allocated again. In
write combining memory and non-temporal stores, this leads to partial
writes, which make much less efficient use of the bus.
This problem can be detected by comparing the ratio of the number of
stores retired to WCB Full Evictions with the number of stores made to
each cache line. If this ratio of ratios is greater than one, WCBs may
not be being used for streaming as efficiently as possible. To fix this,
spread the write streams across several inner loops instead of one. This
type of optimization is called loop fission.
Note that since WCBs may be shared among logical processors, the number
of WCB Full Evictions may increase when using Hyper-Threading technology.