We measure the latency it takes for a CPU to send a message to another CPU via its cache coherence protocol.
By pinning two threads on two different CPU cores, we can get them to do a bunch of compare-exchange operation, and measure the latency.
How to run:
$ cargo install core-to-core-latency
$ core-to-core-latency
CPU | Release Date | Median Latency ---------------------------------------------------------------------|---------------| -------------------------- Intel Core i9-12900K @ 8+8 Cores (Alder Lake, 12th gen) | 2021-Q4 | 35ns, 44ns, 50ns Intel Xeon Platinum 8375C @ 2.90GHz 32 Cores (Ice Lake, 3rd gen) | 2021-Q2 | 51ns Intel Xeon Platinum 8275CL @ 3.00GHz 24 Cores (Cascade Lake, 2nd gen)| 2019-Q2 | 47ns Intel Core i7-1165G7 @ 2.80GHz 4 Cores (Tiger Lake, 11th gen) | 2020-Q3 | 27ns Intel Core i9-9900K @ 3.60 GHz 8 Cores (Coffee Lake, 9th gen) | 2018-Q4 | 21ns Intel Xeon E5-2695 v4 @ 2.10GHz 18 Cores (Broadwell, 5th gen) | 2016-Q1 | 44ns Intel Core i5-4590 CPU @ 3.30GHz 4 Cores (Haswell, 4th gen) | 2014-Q2 | 21ns AMD EPYC 7R13 @ 48 Cores (Milan, 3rd gen) | 2021-Q1 | 23ns and 107ns AMD Ryzen 9 5950X @ 3.40 GHz 16 Cores (Zen3, 4th gen) | 2020-Q4 | 17ns and 85ns AMD Ryzen 7 2700X @ 3.70 GHz 8 Cores (Zen+, 2nd gen) | 2018-Q3 | 24ns and 92ns AWS Graviton3 @ 64 Cores (Arm Neoverse, 3rd gen) | 2021-Q4 | 46ns AWS Graviton2 @ 64 Cores (Arm Neoverse, 2rd gen) | 2020-Q1 | 47ns Apple M1 | 2020-Q4 | 39ns Apple M1 Max | 2021-Q4 | 39ns
See the notebook for additional CPU graphs: results/results.ipynb, it includes hyperthreading and dual socket configurations
Data provided by bizude.
This CPU has 8 performance cores, and 2 groups of 4 efficient cores. We see CPU=8 with fast access to all other cores.
From an AWS c6i.metal
machine.
From an AWS c5.metal
machine.
Data provided by Jonas Wunderlich
My gaming machine, it's twice as fast as the other server-oriented CPUs.
From a machine provided by GTHost
Data provided by Felipe Lube de Bragança
From an AWS c6a.metal
machine.
We can see cores arranged in 6 groups of 8 in which latency is excellent within (23ns). When data crosses groups, the latency jumps to around 110ns. Note, that the last 3 groups have a better cross-group latency than the first 3 (~90ns).
Data provided by John Schoenick.
We can see 2 groups of 8 cores with latencies of 17ns intra-group, and 85ns inter-group.
Data provided by David Hoppenbrouwers.
We can see 2 groups of 4 cores with latencies of 24ns intra-group, and 92ns inter-group.
From an AWS c7g.16xlarge
machine.
From an AWS c6gd.metal
machine.
See the notebook for additional CPU graphs: results/results.ipynb, it includes hyperthreading and dual socket configurations
First install Rust and gcc
on linux, then:
``` $ cargo install core-to-core-latency $ core-to-core-latency Num cores: 10 Using RDTSC to measure time: false Num round trips per samples: 1000 Num samples: 300 Showing latency=round-trip-time/2 in nanoseconds:
0 1 2 3 4 5 6 7 8 9
0 1 52±6 2 38±6 39±4 3 39±5 39±6 38±6 4 34±6 38±4 37±6 36±5 5 38±5 38±6 38±6 38±6 37±6 6 38±5 37±6 39±6 36±4 49±6 38±6 7 36±6 39±5 39±6 37±6 35±6 36±6 38±6 8 37±5 38±6 35±5 39±5 38±6 38±5 37±6 37±6 9 48±6 39±6 36±6 39±6 38±6 36±6 41±6 38±6 39±6
Min latency: 34.5ns ±6.1 cores: (4,0) Max latency: 52.1ns ±9.4 cores: (1,0) Mean latency: 38.4ns ```
Use core-to-core-latency 5000 --csv > output.csv
to instruct the program to use
5000 iterations per sample to reduce the noise, and save the results.
It can be used in the jupter notebook results/results.ipynb for rendering graphs.
Create a GitHub issue with the generated output.csv
file and I'll add your results.
This software is licensed under the MIT license