Bandwidth Considerations
When many threads are used, and they do not cause cache contention by using the same cache lines on different cores, there still are potential(潜在的) problems.
Each process...
Atomicity Optimizations
If multiple threads modify the same memory location concurrently, processors do not guarantee any specific result(处理器不保证任何特定结果).
This is a deliberate(商榷) decision made to ...
Prefetching
目的
The purpose of prefetching is to hide the latency of a memory access.
直接预取,可以隐藏内存访问的延迟。
The command pipeline and out-of-order (OOO) execution capabilities of today’s processors c...
Optimizing TLB Usage
There are two kinds of optimization of TLB usage.
降低程序使用的 page 数量
The first optimization is to reduce the number of pages a program has to use.
This automatically results i...
Optimizing Level 2 and Higher Cache Access
Everything said about optimizations for level 1 caches also applies to level 2 and higher cache accesses.
ps: 优化的思想是相通的。
补充内容
There are two additional...
优化 L1 指令
Preparing code for good L1i use needs similar techniques as good L1d use.
The problem is, though, that the programmer usually does not directly influence the way L1i is used unless s/he ...
Cache Access
Programmers wishing to improve their programs’ performance will find it best to focus on changes affected the level 1 cache since those will likely yield the best results.
We will di...
What Programmers Can Do
After the descriptions in the previous sections it is clear that there are many, many opportunities for programmers to influence a program’s performance, positively or nega...