Arm Cortex-A76AE: A new age of high-performance processing with advanced safety

September 26, 2018

The appetite for further autonomy in our vehicles is spurring advances in the underlying technology. As the number of sensors in and around the vehicle increases, so must the capability of CPUs for large-scale data processing. Traditional system designs with separate Electronic Control Units (ECUs) for the gateway, infotainment and Advanced Driver-Assistance Systems (ADAS), are making way for innovative approaches in fully integrated systems. These systems will consist of multiple applications running at different levels of Automotive Safety Integrity Level (ASIL). In order to realize fully autonomous vehicles in mass production, scalable, high performance computing with inherent safety is required.

The Cortex-A line of products has seen some very exciting innovation in microarchitecture design over the last decade. Some of my personal highlights include the introduction of coherent heterogeneous processing with big.LITTLE, native 64-bit support and a brand-new memory system with DynamIQ technology. Today sees another step change in the Cortex-A microarchitecture with the introduction of the automotive enhanced Arm Cortex-A76AE, the first high performance Cortex-A CPU with Split-Lock capability. Split for performance, lock for safety.

Introducing the world’s first autonomous-class processor with integrated safety

As the name suggests, the Cortex-A76AE is based on the recently announced, Cortex-A76 design. It is a superscalar, out-of-order processor that delivers similar levels of performance as the Cortex-A76 across integer, floating point, memory and machine learning, and achieves similar levels of energy efficiency. Where the Cortex-A76AE is different, is through microarchitectural upgrades for functional safety and added application flexibility.

Arm Cortex-A76AE capabilities

Main benefits of Cortex-A76AE

The Cortex-A76AE is purpose-built for functional safety applications such as ADAS and autonomous vehicles. Let’s have a look at the main three benefits of the Cortex-A76AE.

1. Safety for autonomous systems

Where the Cortex-A76AE really stands out, is in its ability to deliver the aforementioned performance, at high safety integrity. It achieves this through a significant redesign of the Cortex-A76, becoming the first high performance Cortex-A CPU to include the Dual Core Lock-Step (DCLS) and Split-Lock features. Configuring two CPU cores in ‘Lock-Step’ is a traditional way of achieving high levels of diagnostic coverage – the ability to detect the occurrence of an error condition.

The flexibility offered through Split-Lock also has a safety benefit. It can be extended to support potential fail-operational modes – the ability to continue to operate in a degraded mode rather than completely shutting the system down. For example, when running in lock mode, if one core starts to exhibit a failure condition, the system could be quiesced and the faulty core be taken off-line (split) allowing continuation in a degraded mode of operation. This ‘split available" capability is critical for any autonomous system. To find out more about how Split-Lock enables safer systems, check out our blog.

The following are the main microarchitectural highlights of Cortex-A76AE for safety:

Dual Core Lock-Step (DCLS): The Cortex-A76AE is capable of running in Dual Core Lock-Step (DCLS), and hence is able to contribute towards a system’s ASIL D hardware diagnostic coverage requirements.
Memory protection: The Cortex-A76AE comes with memory protection as standard. It supports Single Error Correction, Double Error Detection (SECDED) ECC and Parity protection in the L1 cache, and SECDED ECC protection with the ability to correct in-line, on the L2 and L3 caches.
RAS features: As part of the Armv8.2 architecture extension, Cortex-A76AE includes RAS features built in. This includes standardized error reporting across the core and the DSU, error injection as a means of testing fault management, and data poisoning as a way of deferring error aborts till point of execution.
Integrated comparators: The Cortex-A76AE includes comparators, which are integrated into the design. These blocks compare outputs from the logical and redundant processing elements to detect for divergence. They follow the error reporting scheme as defined in the Armv8.2 RAS architecture.

Apart from the hardware features above, the Cortex-A76AE has been developed on an advanced process for the avoidance of systematic faults. This enables it to meet the ASIL D systematic requirements as standard.

2. Performance for ADAS and Autonomous Driving

Cortex-A76AE has been designed to act as the decision engine in next generation ADAS and Autonomous Vehicle systems. It delivers a 30% uplift in performance over its predecessor, the Cortex-A75, and a whopping 60% increase in performance over Cortex-A72. This massive boost in performance meets the emerging CPU requirements for autonomous driving of more than 250K DMIPS at less than 15 Watts for the compute cluster. This fits well within an SoC power budget of 30 Watts.

The microarchitecture of Cortex-A76AE is largely based on Cortex-A76, with the following highlights that deliver on performance:

Decoupled branch prediction and instruction fetch: Built to hide latency at high bandwidth, the in-order Cortex-A76AE front-end is able to fetch 4 to 8 instructions per cycle, using multi-level branch target caches and hybrid indirect predictor to sustain the maximum throughput.
A wider machine: First 4-wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core, supporting a wider area-/power-optimized instruction window.
More integer and vector execution throughput: Quad-issue integer units are integrated in the core including 3x simple ALU and 1x multi-cycle integer. Moreover, Cortex-A76AE supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU and 4x ML uplift over Cortex-A75.
Enhanced memory system: The full cache hierarchy is co-optimized for latency and bandwidth, with a sophisticated fourth generation prefetcher, deep memory-level parallelism.

3. Flexibility in mixed criticality systems

As mentioned in the introduction, next generation ADAS and autonomous driving systems will consist of multiple applications running at different levels of safety criticality (i.e. different levels of ASIL). This presents a challenge to silicon providers when scoping the compute complex. How do you size-up the performance requirements five to six years ahead of knowing the exact mix of safety critical applications in a vehicle?

Cortex-A76AE solves the challenge of mixed criticality applications through its ability to operate in two modes, performance mode and safety mode. In performance mode, all cores within a cluster operate as Symmetrical Multiprocessors (SMP). In other words, a user is able to utilize all the compute resources within a cluster, coherently. In safety mode, pairs of Cortex-A76AE cores in a cluster are configured to run in Lock-Step.

A functionally-safe coherent interconnect such as the Arm CoreLink CMN-600AE, can support multiple clusters of Cortex-A76AE. In such a system, any mix of clusters can be run in performance mode and safety mode, to achieve a fine-grained balance to match the mix of safety critical applications. The mode of operation can be changed at any time through reset. This means that a Tier1 or car manufacturer is able to tune the platform to fit any mix of safety critical applications, post production.

This flexibility drastically improves the usability of platforms based on Cortex-A76AE, across multiple generations and market segments. To further aid configurability, the Cortex-A76AE is based on Arm DynamIQ technology, meaning that it is also extremely scalable in terms of performance, power and area.

The journey does not end here

It is an exciting time to be involved in the automotive market. The launch of the Cortex-A76AE and the Safety Ready program will enable countless innovations as we journey towards a fully autonomous future. Like to find out more? Then visit our Automotive Solutions page and keep a look out for our soon-to-be-announced webinars, where we will discuss this in more detail. Watch this space!

To learn more about Cortex-A76AE, please visit our page below:

Cortex-A76AE

Embedded and Microcontrollers blog

Adapting Kubernetes for high-performance IoT Edge deployments

Alexandre Peixoto Ferreira

In this blog post, we address heterogeneity in IoT edge deployments using Kubernetes.
- August 21, 2024
Evolving Edge Computing and Harnessing Heterogeneity

Alexandre Peixoto Ferreira

This blog post identifies heterogeneity as an opportunity to create better edge computing systems.
- August 21, 2024
Demonstrating a Hybrid Runtime for Containerized Applications in High-Performance IoT Edge

Chris Adeniyi-Jones

In this blog post, we show how a hybrid runtime and k3s can be used to deploy an application onto an edge platform that includes an embedded processor.
- August 21, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog