Pick a core, any core, says Intel – we’ll magically put the right workload onto one in a hybrid SoC or accelerator

Intel has revealed some details about its upcoming chip designs, and claims they are the biggest and most significant change to its products for years. This includes new CPUs and the Alder Lake family that places different types of core alongside each other.

The chip giant is kinda taking a leaf out of the Arm world’s book, here.

Arm-compatible smartphones, tablets, and PCs typically have a mix of CPU core types: some lightweight ones that are battery friendly and focused on running things like background tasks and less-demanding apps, and some that draw more power and are used only when applications need a surge in processing performance. The operating system should assign apps to the appropriate cores, ensuring software gets the right level of performance without draining the battery too much.

Intel is going the same route: one of its new CPU core types is called the Efficient Core – aka Gracemont – and Intel says that when compared to its 2015-era Skylake architecture, it “achieves 40 percent more performance at the same power or delivers the same performance while consuming less than 40 percent of the power.”

As the name implies, Intel intends this E core to be used in devices like thin and light laptops in which power management is essential.

Some notable features: the E core has a 64KB L1 instruction cache, 32KB of L1 data cache, and up to 4MB of L2 cache shared by four cores. Spectre-be-damned, Intel is leaning hard into speculative execution to speed up software. This means improvements, we’re told, to the branch prediction and prefetchers, and dual three-wide x86 instruction decoders – so up to six instructions per cycle queued up. The pipeline has a 256-entry out-of-order window, and 17 execution ports into the integer ALUs, numerous floating-point and vector math units, and memory access units.

The E core supports AVX vector math with extensions to accelerate integer-based machine-learning calculations.

An overview of Intel's Efficient Core

Intel’s overview of its Efficient Core … Click to enlarge

And to complement the E core is the Intel Performance Core, previously discussed under the name Golden Cove. The P Core includes Advanced Matrix Extensions, a new acceleration engine designed to speed AI workloads, and is supposed to scale from laptops and desktop PCs to servers.

Like with the E, Intel says the P core uses all manner of new techniques and speculative execution to anticipate processing requirements and get stuff done efficiently. The P core has, among other things, six instruction decoders, a bigger and wider micro-op cache, 12 execution ports, bigger register files, improved branch prediction and prefetching, a 512-entry reorder buffer, faster math operations, 32KB of L1 instruction cache, 48KB of L1 data cache, and up to 2MB of L2 cache. More instructions are executed at the rename and allocation stage of the pipeline.

Overview of the Intel P core

And the P core … Click to enlarge

All of which is lovely – but Intel’s next-gen architectures typically improve that sort of thing. If you want the full details, Intel’s presentations, slides, and so forth on its new architectures are here.

Crucially, this time around, Intel will put E cores and P cores on the same system-on-a-chip, and operating systems can decide which core and which type of core to use.

That trick is only possible with forthcoming Alder Lake systems-on-a-chip that incorporates new Intel tech called Thread Director. Intel explained that Thread Director can detect a demanding workload like a game starting up and give it some P core time. If email is synching in the background, it gets an E core. Thread Director can schedule to both types of cores and can detect idling workloads on a P core and shunt it off to an E core until it can justify use of the higher-performance part of the chip.

While talking this up as a new dawn of hybrid chips, Intel acknowledged that other chipmakers have been here before. Chipzilla’s point of difference is a belief that its hybrids are all about performance, while rivals mix and match cores to control power consumption.

Windows 11 will be ready for Thread Director out of the box. Linux developers are aware it’s coming.

Whether it matters to Linux or other OSes aimed at servers is moot as while Intel advances the P core as just the thing for “large code footprint applications,” Alder Lake is a client architecture.

Servers take a P, too

Happily, Intel is also using P cores in its new server silicon – Sapphire Rapids – to take on the mantle of the Xeon Scalable Processor range.

Sapphire Rapids puts Intel’s “tiles” tech to work. Tiles are essentially single CPUs, but Intel has figured out how to place multiple tiles in a single package that uses “embedded multi-die interconnect bridge” packaging to present all the tiles as a single, logical, processor.

Tiles are how Intel plans to serve clouds and workloads like AI or microservices that require scale.

Sapphire Rapids therefore represents a riposte to the many-core Arm-powered CPUs that the likes of Oracle and AWS are currently advancing as just the ticket for microservices in a one-core-one-container world.

The new Xeon architecture includes three accelerators designed to speed servers.

The Accelerator Interfacing Architecture (AIA) might be the most significant because it’s how Intel will offload housekeeping workloads like handling storage I/O or running virtual switches into what it now prefers to call Infrastructure Processing Units (IPUs – née DPUs, out of SmartNICs).

Intel execs at the 2.5-hour Architecture Day presentation The Register attended were cagey about exactly how AIA will put IPUs to work, mentioning collboration with Microsoft and VMware, but were bullish at the prospect of reclaiming the power of those lovely P cores by pointing out that some microservices at Facebook use between 31 and 83 per cent of server power on overheads.

One possibility afforded by AIA was diskless servers: apparently the architecture and an IPU can define a virtual NVMe device that uses external storage – even as a boot drive!

Intel suggested that IPUs and AIA will be big in clouds and among communications service providers, and that as-yet-unrevealed alliances will make IPUs relevant to even mainstream data centres.

And in a related non-surprise, Intel is now selling IPUs. The company revealed “Mount Evans”, its first dedicated ASIC-based IPU, and an FPGA-based IPU reference platform called “Oak Springs Canyon”.

The mere fact that Intel has made a server CPU ready for IPUs matters because hype about the devices and a new data centre architecture they can enable has built for years, with little sign of how it will be realised. AIA shows IPUs/SmartNICs/DPUs are on their way to real use.

Faster, chippycat! Drill, drill!

Another important accelerator in Sapphire Rapids is “Advanced Matrix Extensions” (AMX), silicon dedicated to tensor processing and therefore to deep learning algorithms. AMX can work even as a Sapphire Rapids CPU’s P cores go about other business. AMX has its own instruction set to speed things along.

The third accelerator is the Data Streaming Accelerator (DSA) that takes care of moving data. This is another bottleneck-clearer aimed at ensuring data flows among the CPU, memory, caches, attached storage and networked storage devices without leaving CPUs waiting for something to do.

If you need even more grunt, Intel has a bridge to sell you – the Ponte Vecchio GPU.

Ponte Vecchio is dripping with customisations for high-performance computing but is as notable for its implementation of tiles and collaboration with TSMC.

The GPU features five tiles, each specialising in different chores and each made using different processes. The Compute Tile is built by Intel’s Taiwanese rival TSMC, and Chipzilla is therefore keen to claim that Ponte Vecchio demonstrates its new IDM 2.0 strategy under which it just gets stuff done by working with whatever foundry is best-suited to the job, rather than only using stuff it built all by itself.

Intel ponte vecchio

Ponte Vecchio, Click to enlarge

Intel also has new GPUs and a new brand for them – “Arc” – mostly aimed at gamers and content creators.

Don’t whip out the chequebook just yet

All of the above sounds like it’ll be fun to play with.

Curb your enthusiasm, readers, because Intel can’t quite say when much of it will land.

Ponte Vecchio is a sometime-in-2022 affair. Alder Lake will arrive “later this year” and it is unclear if PCs featuring it will arrive in time to let Windows 11 take advantage of thread director when the OS launches in “late 2021”.

Intel CEO Pat Gelsinger only appeared at the end of the Architecture Day presentation and had little to say other than the new stuff representing a change in the way we need to think about chips. In the past, he said, new processes defined important advances in silicon technology. Packaging silicon into hybrid machines is where the action is today, he opined.

It didn’t seem at all odd for Gelsinger to just put the cherry on top of all the technology announcements mentioned above, because he wasn’t at the company when they were decided and developed.

Gelsinger is, however, in the big chair now that Intel must sell what it believes are market-making innovations, even as Arm presses deeper into Intel territory, Qualcomm advances its ambitions to run everywhere, AWS pushes its own silicon ahead of Xeons, AMD finds novel ways to attack, and Nvidia tries to muscle into almost every computing niche. ®

Source link

Related Articles

Back to top button