For NUMA programming everything said so far about cache optimizations applies as well.
The differences only start below that level.
NUMA introduces different costs when accessing different parts of the address space.
With uniform memory access we can optimize to minimize page faults (see section 7.5) but that is about it.
All pages are created equal.
NUMA changes this.
Access costs can depend on the page which is accessed.
Differing access costs also increase the importance of optimizing for memory page locality.
NUMA is inevitable for most SMP machines since both Intel with CSI (for x86,x86-64, and IA-64) and AMD (for Opteron) use it.
With an increasing number of cores per processor we are likely to see a sharp reduction of SMP systems being used (at least outside data centers and offices of people with terribly high CPU usage requirements).
Most home machines will be fine with just one processor and hence no NUMA issues.
a) does not mean programmers can ignore NUMA and
b) it does not mean there are not related issues.
If one thinks about generalizations（概括） toNUMAone quickly realizes the concept extends to processor caches as well.
Two threads on cores using the same cache will collaborate faster than threads on cores not sharing a cache.
This is not a fabricated case:
（1）early dual-core processors had no L2 sharing.
（2）Intel’s Core 2 QX 6700 and QX 6800 quad core chips, for instance, have two separate L2 caches.
（3）as speculated early, with more cores on a chip and the desire to unify caches, we will have more levels of caches.
多级缓存与 NUMA 的相同问题
Caches form their own hierarchy（等级制度）;
placement of threads on cores becomes important for sharing (or not) of the various caches.
This is not very different from the problems NUMA is facing and, therefore, the two concepts can be unified.
Even people only interested in non-SMP machines should therefore read this section.
In section 5.3 we have seen that the Linux kernel provides a lot of information which is useful–and needed–in NUMA programming.
Collecting this information is not that easy, though.
The currently available NUMA library on Linux is wholly inadequate（完全不足） for this purpose.
A much more suitable version is currently under construction by the author.
The existing NUMA library, libnuma, part of the numactl package, provides no access to system architecture information.
It is only a wrapper around the available system calls together with some convenience interfaces for commonly used operations.
当今 linux 系统命令
The system calls available on Linux today are:
Select binding of specified memory pages.
Set the default memory binding policy.
Get the default memory binding policy.
Migrate all pages of a process on a
set of nodes to a different set of nodes.
Move selected pages to given node or request node information about pages.
These interfaces are declared in the
<numaif.h> header which comes along with the libnuma library.
Before we go into more details we have to understand the concept of memory policies.
NUMA（Non Uniform Memory Access Architecture）技术可以使众多服务器像单一系统那样运转，同时保留小系统便于编程和管理的优点。
它是在二十世纪九十年代被开发出来的，开发商包括Burruphs （优利系统）， Convex Computer（惠普），意大利霍尼韦尔信息系统（HISI）的（后来的Group Bull），Silicon Graphics公司（后来的硅谷图形），Sequent电脑系统（后来的IBM），通用数据（EMC）， Digital （后来的Compaq ，HP）。