缺页优化
On operating systems like Linux with demand-paging support, an mmap call only modifies the page tables.
It makes sure that, for file-backed pages, the underlying data can be found and, for a...
Improving Branch Prediction
In section 6.2.2, two methods to improve L1i use through branch prediction and block reordering were mentioned:
static prediction through __builtin_expect and profile ...
Measuring Memory Usage
Knowing how much memory a program allocates and possibly where the allocation happens is the first step to optimizing its memory use.
There are, fortunately, some easy-to-u...
Simulating CPU Caches
While the technical description of how a cache works is relatively easy to understand, it is not so easy to see how an actual program behaves with respect to(尊重) cache.
Prog...
Memory Performance Tools
A wide variety of tools is available to help programmers understand performance characteristics of a program, the cache and memory use among others.
Modern processors hav...
Explicit NUMA Optimizations
All the local memory and affinity rules cannot help out if all threads on all the nodes need access to the same memory regions.
It is, of course, possible to simply re...
Querying Node Information
The get_mempolicy interface can be used to query a variety of facts about the state of NUMA for a given address.
#include <numaif.h>
long get_mempolicy(int *policy...
Swapping and Policies
If physical memory runs out, the system has to drop clean pages and save dirty pages to swap.
The Linux swap implementation discards(丢弃) node information when it writes page...