Network on Chip: The Future of Processor Microarchitecture Today
Network on Chip (NoC) is a scheme for organizing communication between operating modules located on the same chip. It is aimed at combining computing cores of varying purposes (executive, graphics, physics, etc.), device controllers, ROM and RAM modules, stand-alone devices, sensors, and much more that can be placed on silicon crystals. Currently, NoC is one of the most promising areas for the development of microprocessor technology in general and single-chip systems in particular. In fact, this process is similar to the development of “large” communication systems: in the case of telephony, first came a direct connection between two devices by wire (the analog for single-chip systems are buses); then the first matrix switches appeared – crossbars; then relay switching of signals (analog – modern version of NoC); and only then, digital switching based on packet transmission – for example, over the TCP/IP protocol in the case of the Internet (the analog for NoC is in active development).
From this article, you will learn about the features of Network on Chip, as well as the problems that inspired the need for its development.
What are the Differences between NoC and SoC?
Modern semiconductors are true scientific wonders, and the subtleties of microprocessor architectures can be quite confusing for beginners. Let us briefly review the differences between the concepts of Network on Chip and System on Chip.
Network on Chip is a special scheme for establishing links between finite components within the SoC or the processor. It ensures maximum data transfer speeds and a reduction in the total number of necessary physical connections. Also, as we have already noted, it allows placing several IPs with different purposes and from different suppliers within the same crystal.
System on Chip (SoC) is a single chip that contains a full set of diverse and interconnected units, designed to solve a certain range of tasks. Traditionally, SoC includes several computing cores, memory controllers, I/O subsystems, connections between them, and means of switching (buses, crossbars, NoC elements).
What Brought the NoC Idea to Life: The Problems of Processor Performance Increase
Only a dozen years ago, the main way of improving processor performance was by increasing its clock frequency (extensive approach). However, after reaching 2 GHz, engineers encountered the first problems, including the physical limitations of the materials used (with too thin a layer of silicon dioxide, it is not possible to guarantee precise switching of a transistor: there are “leakage currents,” which entail overheating of the crystal, excessive power consumption, and even improper operation) and technical processes (energy consumption can be decreased by reducing the physical size of the logical elements, which is difficult to achieve with existing lithography methods and semiconductor materials). Therefore, the search for alternative solutions began. A range of innovative techniques allows modern CPUs to work stably at 3.8-4 GHz. But any further increase results in extreme thermic output, which under home conditions cannot be adjusted to acceptable values – it requires complex cooling systems, up to liquid nitrogen.
The first expansive solution was the introduction of computational pipelines and branch prediction techniques. The idea of pipelines is that the execution thread is divided into micro-operations, which are fed into the computational pipeline. While micro-operations are moving along the pipeline to the execution device, other processor modules are engaged in preparing data for their execution – by sampling from memory, accessing devices and ports, etc. The branch predictor is a mechanism that looks at the execution thread, finds conditional branch operations in it, and predicts the result of calculating their conditions based on previous operations. These two technologies allow optimizing computational resources and reducing downtime. It should be noted, however, that at the beginning of 2018, quite serious vulnerabilities in how these algorithms are implemented in the processors of most manufacturers were announced – Spectre and Meltdown. The solution for already released devices is rather nontrivial and leads to a drop in processor performance – from 10 to 25%. The first patches for processor microcode firmware (Intel) and Windows security systems (Microsoft) have already led to a series of failures and problems.
The next step was the parallelization of calculations based on integrating several processor cores in a single package. Parallel computing itself is not a new topic: the first theoretical justification for task parallelizing was made by Gaspard de Prony when calculating logarithmic and trigonometric tables for the French Cadastre at the end of the 18th century. In application to modern computer technology, parallel calculations have been developed since the advent of “large” computers in the 1950s. Here, the joint work of several separate processors is used for calculations – up to thousands in some cases. Nevertheless, until the mid-1980s, such methods remained the prerogative of mainframes – until the first supercomputer based on desktop processors (Intel 8086) was developed in the context of the Caltech Concurrent Computing project.
In fact, instead of concentrating on the maximum possible number of operations per second in a single execution thread, multi-core processors allow several threads to be executed simultaneously, up to two per core. The idea itself is good, apart from one point: the performance of parallel computing depends heavily on how well a particular task can be parallelized and how well the developers cope with it.
There is one more problem. Since the beginning, the connecting element between these units (cores) was the processor bus – a legacy of single-core processors. Its main drawback is that only one processor block can transfer data at a time. All other units can only be recipients at that time. When there is a need to transmit signals from N blocks, during every given slice of time, N-1 blocks must “wait” for their turn. Such waits result in delays, which is a critical disadvantage for systems that assume high loads. In particular, this feature began to impede the further advancement of the processor companies, which sought to increase the number of cores in their products. After all, a high number of cores gives no advantage if there are a lot of forced collisions. The next stage in the development of the bus structure was a matrix scheme, also called a crossbar switch. But in fact, this is just an increase in the number of links between individual blocks. That is why crossbars are also not an unambiguous solution to the problem. Such a connection scheme simply allows deferring the issue for a while by organizing more cross-links.
Unfortunately, these two factors severely constrain software developers. They have to look for workarounds to implement their tasks.
Here we come to the root of the problem. There are three constituents and possible routes to solve it:
- Clock frequency increase. The current state of the science does little to allow increasing of clock rates higher than 5 GHz without costly coolant systems. The research is ongoing, but results will not come quickly.
- Multiplication and specialization of operation cores. Increasing the performance of processors only by multiplying cores, if intercommunications between the latter are based on the bus architecture and its derivatives, exhausts itself at the point of 16 cores per crystal.
- Parallel methods in programming. The methods of parallel programming are quite mature but:
- Firstly, not every task can be parallelized. There are a lot of problems that must be solved sequentially;
- Secondly, everything here depends on programmers: their skill and tools. Little can be done by chipmakers, though Intel, AMD, and Qualcomm – all the leading vendors – try to help in every way they can.
Recently, in search of solutions to the second route, engineers have arrived at the idea of implementing packet-based switching/routing systems (as seen in modern communication systems – TCP/IP, cellular networks, Wi-Fi, Bluetooth, etc.). The whole idea of establishing a relationship between the processor blocks by routing the data packets is called Network on Chip. The features of its technical implementation will be discussed below.
NoC Architecture: What are the Characteristic Features of this Completed Concept?
First, let us figure out what would be the typical features of a developed Network on Chip architecture. Instead of the traditional scheme of setting up direct links between blocks or connecting all the blocks to one or more common buses (in which the signals are transmitted by the same lines but at different times), this architecture implies the connection of each block to an internal network built over switches, which would route the signals between themselves and the operating blocks.
Each of the switches is a network node, similar to the analogs in traditional data networks – LANs and WANs. The switch splits the data into smaller parts (packets), and in this form sends them to the recipient through a chain of subsequent switches. Each intermediate node browses the headers of the received packets in search of destinations and forwards them to the next ones in accordance with the routing tables. It should be noted that packets can reach the final switch in different ways, because intermediate nodes can change their routes, depending on the load of specific connections (dynamic routing). The final switch collects the original message from the packets and transmits it to the recipient. Moreover, such a switching system allows setting the priorities for different traffic (for example, streaming media content that is sensitive to time delays must be transmitted with the highest priority) and even implementation of Quality of Service algorithms within a single-chip system.
The topology of Network on Chip in its current concept is based on three types of elements:
- Network interfaces.
Switches adopt the task of routing signals between nodes of the network, according to the selected topology. Note that the architecture of NoC assumes the use of a fairly powerful switch, which would deliver minimum levels of delay (up to nanoseconds). But thanks to its implementation, in addition to increased processing power, it is possible to achieve the flexibility provided by the integration of IP cores (including graphics cores, controllers, and signal microprocessors of narrow specialization) from different vendors (provided they are fitted with compatible network interfaces, of course), as well as the selection of possible network topologies (since the communication between individual functional blocks can be established in various ways).
Conductors provide physical connections between switches and network interfaces. Some variations will even include communication buffers.
And, finally, network interfaces. These NoC elements will ensure the connection of individual SoC elements to a common internal network.
NoC Development Prospects
The need for NoC arose and began to grow simultaneously with the transition to an expansive layout of processors and single-chip systems (by increasing the number of computational cores and operating modules of varying purpose). As a part of its research project, Intel in 2007 publicly presented the conceptual 80-tile processor with performance of one teraflop. This processor model consumed a surprisingly small amount of energy – 62 watts. Each of the tiles contained a computing element and router. That was quite impressive performance then, and only ten years later, in 2017, the first production-ready desktop processor was released – the 18-core Intel Core i9-7980XE – that broke the same barrier with less computing cores and a TDP of 165 watts.
The main directions in the development of modern solutions based on NoC in the short term are virtual bypassing and low-swing signaling. These technologies can further reduce power consumption and minimize transmission delays, allowing the size of systems to be scaled and the number of cores to be increased as needed (without compromising their functionality).
Currently, the main field of NoC application is still the creation of supercomputers (for example, based on the 260-core ShenWei Microprocessor family by the Chinese Jiāngnán Computing Lab company). However, all the leading chipmakers are already implementing separate NoC elements in the development of desktop and server chips. These are, for instance, Infinity Fabric from AMD – the basis of the newest Zen (CPU) and Vega (GPU) processor families – or the joint development of AMD and Intel: Kaby Lake-G chips.
As for the further directions of NoC development, the main one is the implementation of the hierarchical OSI network model. In the case of the Internet, separation of connection elements – physical, channel, protocol levels, and so on – served as one of the reasons for its rapid development and created the opportunity for competition of hundreds of companies and ideas. We do not doubt that the same will happen with NoC.
Leading Developers of NoC systems
The most famous developers of NoC systems and elements are NetSpeed Systems, Arteris, Sonics, and Aims Technology Inc. They create theoretical and hardware solutions for such world-renowned chip vendors as Qualcomm and Intel.
Let us conclude. As you may already understand, Network on Chip is a new stage in the development of processor microarchitectures, as it will allow increasing the number of cores in a single-chip system while minimizing collisions. However, you cannot rely entirely on the technical characteristics of the processor: an important role in the speed of computation is played by the approach that the developers used during the creation of the application. If you have an idea for a solution with the complex business logic that would require parallel computations, please contact us today! Our team specializes in creating such software, so we will happily take up your project.