Alder Lake Core Latency

Now that I have had a few days to play around with my i9-12700K we can take a deeper dive into the microarchitecture. Since this is a completely different micro-architecture than previous Intel releases I have tried to gain as much accurate information as I could. Some of this information I present below could change as I continue to learn more about Intel’s Hybrid Technology. Latency plays a big part in performance and there’s a balance that must be met in regard to bandwidth. Let’s take a look at Alder Lake’s core latency.

Performance Core Latency

P-Core to P-Core Latency Average = 27.59 ns
P-Core to P-Core Latency Minimum = 4.25 ns
P-Core to P-Core Latency Maximum = 33.75 ns

Starting with the Performance Core (#0 to #15) latency we see that the average is 27.59ns. The lowest latency recorded was 4.25ns (rounded to 4.3) on Core #7. The highest latency recorded was 33.75ns (rounded to 33.8) between Core #2 and Core #13. Overall this isn’t that bad at all, well that is until the P-Core needs to communicate with the E-Core. At that point the latency begins to climb for certain P-Cores. As expected the further the P-Cores are away from the E-Cores the more latency is gained. For example, Performance Core #3 has 40.3ns latency to Efficient Core #16. However, the Performance Core #15 only has 33.8ns latency to Efficient Core #16.

Efficient Core Latency

E-Core to E-Core Latency Average = 40.2 ns
E-Core to E-Core Latency Minimum = 31.0 ns
E-Core to E-Core Latency Maximum = 51.5 ns

Moving down to the Efficient Cores shows much higher Latencies. The Efficient Cores are smaller Intel Atom Cores that are clustered together in groups of 4. The i9-12900K has two clusters with each cluster containing 4 Efficient Atom Cores (2 Clusters x 4 Atom cores = 8 Cores). Cores #16 to #23 are our Atom cores. The average latency amongst the E-Cores is 40.2ns with the lowest latency recoded showing 31ns. Since the 8 Atom Cores are grouped into a cluster of 4, the maximum latency is very high. Earlier in the article (Gracemont Core – Architectural Deep Dive) I explained how each cluster (4 E-Cores) shares a single L2 cache. This can cause high latency when compared to the P-Cores. This benchmark also shows us how the Atom Cores are paired within the architecture. For data to travel from one cluster to the next means that it must travel across the shared L2 Cache and the L3 Cache (LLC), unlike the Performance Cores which doesn’t need to share the L2 Cache with other Cores.

Individual Core & Cache Latency

Prior to Alder Lakes release I was fascinated with Intel’s Hybrid Technology and their new take on the well-known big.LITTLE architecture. I had many questions and I had to find the answers to my questions. Perhaps my interest will answer questions for others who may have had the same thoughts. Now we are going to dive a little deeper into the micro-architecture and take a look at other latencies within Alder Lake-S. It is extremely hard to benchmark cache performance and normally I would just run various well know memory benchmark tools, but Intel’s Hybrid Technology is so new that some of the popular tools cannot properly benchmarks on the Alder Lake uarch. Therefore I will try my best to determine how well Alder Lakes micro-architecture performs. Let’s take a look at Alder Lake-S latency to system memory (DRAM).

Alder Lake-S CPU Latency to System Memory (DRAM – DDR5-4800Mhz CAS40) = 75.4 ns

Prior to the Alder Lake release we knew that DDR5 would bring higher latencies and much higher CAS therefore high latency was expected. Due to how Intel developed Alder Lake the latency numbers will differ slightly. The cores can quickly enter and leave C-states\low-power states which can affect the latency. During my benchmarks I noticed that the Cores would be in certain C-states since the Intel Thread Director & Windows Scheduling doesn’t always consider every workload a major task. All of the test presented shows my PC running stock settings. Now we will check out the latencies for each individual core to the system memory (DRAM).

Alder Lake Individual Cores Latency To DRAM

We see that the P-Cores (Core0 to Core15) have much higher latency when compared to the E-cores (Core16-Core23). I thought this was very interesting. The Efficient Cores can access data to and from DRAM by as much as 1.9 times faster than the Performance Cores. It should be noted that this benchmark tests random memory locations as well. The Performance Cores latency is 92% higher than the Efficient Cores. Now we see how the Efficient Atom Cores obtain efficiency and high throughput. It also helps that the Efficient Cores are Single-Threaded (no Hyper-Threading\SMT). Now let’s move down the memory hierarchy and check out the lower layers.

Alder Lake Individual Cores Latency To L3 Cache

I have updated my chart above to reflect my latest benchmarks. While still using random memory locations, we can see various changes in latencies for each core. We must remember that each core frequency can be individually different from one another; this was introduced with Intel’s Max Turbo3.0. Therefore we can see different results for each core. For example, while the average latency for all Cores was 17ns, the lowest latency recorded was 13.3ns and the highest latency is 21.7ns. I purposely wanted to use these results to show how much of a difference we can expect when leaving idle periods and sending workloads to the cores. Despite the high latency on one core (#13) on average the P-Cores latency is 18.55ns while the E-Cores average 13.96ns.

Alder Lake Individual Cores Latency To L2 Cache

The P-cores show 2.0ns – 2.1ns. As I go deeper into the uarch it becomes harder to benchmark the E-cores, but we can see that they all share a similar latency (5.2ns). This could be due to the fact that they are clustered and share the same L2 cache. As a reminder there are two L2 Caches for the Efficient Atom Cores and each L2 cache contains 4 Atom Cores.

Alder Lake Individual Cores Latency To L1 Cache

Now that we are directly on the core we can take a look at how low the L1 cache latency can get. Latency to the P-cores show approx. 1.1ns and all of the Efficient Cores show the same latency at 0.8ns. It is possible that these numbers will vary, but these numbers appear to be fairly accurate. In this benchmark the Efficient Cores appear to outperform the Performance cores by 20%.

Alder Lake Individual Cores Latency To All Levels

Here is a picture wrapping up my benchmark results. Here we can see all of the different levels and latency for the Alder Lake micro-architecture. The Efficient Core results were surprising in relation to the system memory (DRAM) since I expected those E- Cores to have much higher latency or at least have latency near the Performance Cores. Instead the Efficient Cores appear to be, well, efficient. I haven’t even begun to overlock the platform yet and I can’t wait to see what I can accomplish as I continue to learn about the Alder Lake uarch.