Write Behavior

Before we start looking at the cache behavior when multiple execution contexts (threads or processes) use the same memory we have to explore a detail of cache implementations.

Caches are supposed to be coherent(相干) and this coherency(相关性) is supposed to be completely transparent for the userlevel code.

Kernel code is a different story; it occasionally(间或,偶尔) requires cache flushes.

This specifically means that, if a cache line is modified, the result for the system after this point in time is the same as if there were no cache at all and the main memory location itself had been modified.


This can be implemented in two ways or policies:

  • write-through cache implementation;

  • write-back cache implementation.


The write-through cache is the simplest way to implement cache coherency(相关性).

If the cache line is written to, the processor immediately also writes the cache line into main memory.

This ensures that, at all times, the main memory and cache are in sync.



The cache content could simply be discarded(丢弃) whenever a cache line is replaced.


This cache policy is simple but not very fast.

A program which, for instance, modifies a local variable over and over again would create a lot of traffic on the FSB even though the data is likely not used anywhere else and might be short-lived.



The write-back policy is more sophisticated(复杂的).

Here the processor does not immediately write the modified cache line back to main memory.

Instead, the cache line is only marked as dirty(标记为脏).

When the cache line is dropped from the cache at some point in the future the dirty bit will instruct the processor to write the data back at that time instead of just discarding the content.



Write-back caches have the chance to be significantly(显著) better performing, which is why most memory in a system with a decent processor is cached this way.

ps: 这种策略性能比较好,所以很多缓存使用这种策略。

The processor can even take advantage of free capacity on the FSB to store the content of a cache line before the line has to be evacuated.

This allows the dirty bit to be cleared and the processor can just drop the cache line when the room in the cache is needed.




But there is a significant(重大的) problem with the write-back implementation.

When more than one processor (or core or hyper-thread) is available and accessing the same memory it must still be assured that both processors see the same memory content at all times.

ps: 必须保证多线程下的缓存一致性。

If a cache line is dirty on one processor (i.e., it has not been written back yet) and a second processor tries to read the same memory location, the read operation cannot just go out to the main memory.

Instead the content of the first processor’s cache line is needed.

In the next section we will see how this is currently implemented.


Before we get to this there are two more cache policies to mention:

  • write-combining; and

  • uncacheable.

Both these policies are used for special regions of the address space which are not backed by real RAM.

The kernel sets up these policies for the address ranges (on x86 processors using the Memory Type Range Registers, MTRRs) and the rest happens automatically.

The MTRRs are also usable to select between write-through and write-back policies.


Write-combining is a limited caching optimization more often used for RAM on devices such as graphics cards(显卡).

Since the transfer costs to the devices are much higher than the local RAM access it is even more important to avoid doing too many transfers.

Transferring an entire cache line just because a word in the line has been written is wasteful if the next operation modifies the next word.

One can easily imagine that this is a common occurrence, the memory for horizontal neighboring pixels on a screen are in most cases neighbors, too.


As the name suggests, write-combining combines multiple write accesses before the cache line is written out.

In ideal(理想) cases the entire cache line is modified word by word and, only after the last word is written, the cache line is written to the device.

This can speed up access to RAM on devices significantly.


Finally there is uncacheable memory.

This usually means the memory location is not backed by RAM at all.

It might be a special address which is hardcoded to have some functionality implemented outside the CPU.

For commodity hardware this most often is the case for memory mapped address ranges which translate to accesses to cards and devices attached to a bus (PCIe etc).

On embedded boards one sometimes finds such a memory address which can be used to turn an LED on and off.

Caching such an address would obviously be a bad idea.






LEDs in this context are used for debugging or status reports and one wants to see this as soon as possible.

The memory on PCIe cards can change without the CPU’s interaction(相互影响), so this memory should not be cached.

  • 不应该被缓存

有些缓存信息,可以不经过 cpu 就可以直接被修改。这些信息,是不应该被缓存的。