Socket/Chip Clause Samples
Socket/Chip. Coupling of several cores in the same integrated circuit is a common approach and is usually referred to as multicore or many-core processors (depending on the amount of cores it aggregates). One of the main advantages is that the different cores share some levels of cache memory. The cache memory is a static random access memory (SRAM), which the chip can access faster than regular random access memory (RAM). The shared caches can improve the reuse of data by different threads running on cores in the same socket, with the added advantage of the cores being close on the same die (higher clock rates, less signal degradation, less power). Having several cores in the same socket allows thread-level parallelism, as each different core can run a different sequence of instructions in parallel whilst having access to the same data.
Socket/Chip. Coupling of several cores in the same integrated circuit is a common approach and is usually referred to as multicore or many-core processors (depending on the amount of cores it aggregates). One of the main advantages is that the different cores share some levels of cache memory. The cache memory is a static random access memory (SRAM), which the chip can access faster than regular random access memory (RAM). The shared caches can improve the reuse of data by different threads running on cores in the same socket, with the added advantage of the cores being close on the same die (higher clock rates, less signal degradation, less power). Having several cores in the same socket allows thread-level parallelism, as each different core can run a different sequence of instructions in parallel whilst having access to the same data. These two deepest levels, "Core/CPU" and "Socket/Chip" are and will be definitely present in any future architecture, being the deepest layers at which a programmer can have control. Therefore, the three patterns should devote equal optimization effort here: in the three cases data and process ordering will be very influential on, for instance, memory access and vectorization. Accelerators/GPUs: Accelerators are specialised hardware that consist of hundreds of simpler computing units that can work in parallel to solve specific calculations over large pieces of data. They include their own memory. Accelerators need a central processing unit (CPU) to process the main code and off-load the specific kernels to them. To exploit the massive parallelism available within the GPUs, the application kernels must be rewritten. The dominant programming language is OpenCL (Open Computing Language) that is cross-platform, while other alternatives are vendor dependent, such as nVIDIA's Compute Unified Device Architecture programming language, widely known as CUDA. In monolithic patterns, codes can run either completely on the accelerators or offloading part of them (for instance the solver) to the accelerator while the other part runs in the host. Both situations represent different challenges, and in the second one host/accelerator data transfer becomes critical. Data transfer is also critical in heterogeneous coupled patterns. Node: A computational node can include one or several sockets and accelerators along with main memory and Input/Output. A computational node is, therefore, the minimum autonomous computation unit as it includes cores to comput...
