RandomX v2

A stronger, fairer way to secure cryptocurrency - and a benchmark tool to measure it

What is RandomX?

RandomX is a proof-of-work algorithm - a mathematical puzzle that computers solve to validate and secure cryptocurrency transactions. Think of it like a combination lock that takes real effort to open, ensuring nobody can cheat the system.

What makes RandomX special is that it's designed to run best on everyday CPUs - the same processors in regular desktops and laptops - rather than expensive specialized hardware. This keeps mining accessible and decentralized.

How RandomX works (simplified)
Transaction Data
RandomX Puzzle
CPU Solves It
Block Confirmed ✓

Each "hash" is one attempt at solving the puzzle. More hashes = more attempts per second.

What Changed in Version 2?

RandomX v2 makes the puzzle 50% harder per hash by increasing the number of computational steps from ~4.2 million to ~6.3 million operations. But "harder" here is a good thing - it means each solution proves more real work was done.

Property v1 (Original) v2 (New)
Operations per hash ~4.2 million ~6.3 million
Program size 256 instructions 384 instructions
Work increase Baseline +50% more work per hash
Best hardware CPUs CPUs (even more so)
Key insight: Even though v2 produces fewer hashes per second, each hash represents 50% more computational work. The total useful work per second can actually be higher with v2 on modern CPUs.

Work per Hash - Visual Comparison

v1
4.2M ops
v2
6.3M ops

Why Upgrade to v2?

🛡️

Stronger Security

More operations per hash means the puzzle is harder to shortcut with specialized chips (ASICs/FPGAs).

⚖️

Fairer Mining

Keeps everyday CPUs competitive, so regular people can participate - not just large mining farms.

Better Efficiency

Modern CPUs can do more useful work per watt of energy with the v2 algorithm.

🔍

Proven by Testing

The benchmark suite rigorously tests performance, stability, and power consumption before adoption.

What the Benchmark Tool Does

This project includes an automated benchmarking script that runs both v1 and v2 side by side on your hardware and compares the results. Here's what it measures:

Benchmark Pipeline
Detect CPU
Optimize Settings
Run v1 & v2
Measure & Compare

Speed

How many hashes per second each version achieves - the raw throughput of the algorithm.

Power Draw

Watts consumed during the test, measured directly from CPU power sensors (RAPL).

Efficiency

RandomX instructions per joule of energy - the most important metric for real-world viability.

Stability

Tracks crashes and errors over hundreds of runs to ensure the algorithm is rock-solid.

Total Work

Combines hashrate with operations-per-hash to show real computational output per second.

Smart Tuning

Automatically detects your CPU type and applies optimal settings for AMD Ryzen, EPYC, and Intel processors.

What v2 Aims to Achieve

The ideal outcome of adopting RandomX v2:

+50%
Work per Hash
High
ASIC Resistance
Better
Energy Efficiency

An Everyday Analogy

Imagine a spelling bee where contestants prove their skill by spelling words correctly.

  • v1 is like spelling 4-letter words. Fast, but someone could build a machine to buzz in first.
  • v2 is like spelling 6-letter words. Slightly slower per word, but it tests real ability more deeply - making it much harder to cheat with a gadget.

The benchmark tool is the stopwatch and scorecard - it times each contestant, tracks their accuracy, and figures out who is the most efficient speller.

In a Nutshell

RandomX v2 makes cryptocurrency mining puzzles harder to cheat on, fairer for regular computer owners, and more energy-efficient on modern hardware. The benchmark tool in this project proves these benefits with real measurements: speed, power draw, and stability, across hundreds of test runs on your own CPU.

Deep Dive: The v1 Idle-Cycle Problem

To understand why v2 exists, you need to understand a fundamental tension inside your CPU: computation is fast, but fetching data from memory is slow.

RandomX's main loop works by repeatedly executing a small random program and then reading a chunk of data (64 bytes) from a large Dataset stored in RAM. Each program iteration is deliberately tuned to take about 50–55 nanoseconds - roughly the same time it takes for DRAM to deliver data. The idea is that the CPU computes while the next piece of data is being fetched in the background (prefetched), so neither the CPU nor the memory bus sits idle.

The problem? While CPU cores got faster over the years, RAM latency stayed basically the same. CPUs have advanced roughly 1.5x since RandomX was designed in 2019, but DRAM still takes ~50–55 ns to respond. Modern processors like AMD Zen 5 finish the v1 program well before the data arrives - and then sit idle, burning power while waiting.

v1: The Idle Gap
CPU
Compute (256 instructions)
XOR mix
Idle - waiting for data
Memory
Fetching next dataset (~50 ns)

The memory fetch takes ~50 ns. The CPU finishes its work in less time, then stalls waiting for data.

This idle window is the core inefficiency of v1. The CPU has finished all its work but the next piece of dataset hasn't arrived yet. Those wasted cycles still consume power - your CPU is burning electricity while doing zero useful work.

How v2 Fills the Gap

RandomX v2 makes two key changes to eliminate the idle window:

1. Bigger Programs (256 → 384 instructions)

CPUs have gotten roughly 1.5x faster since RandomX was designed in 2019, but RAM latency stayed at ~50-55 ns. So v2 increases the program to 384 instructions (1.5x), matching the CPU-to-memory speed ratio again and filling the gap with real computation.

2. AES Register Mixing (XOR → 16 AES rounds)

In v1, the F and E registers were mixed with a simple XOR - nearly instant. In v2, they're mixed with 16 AES encryption rounds. This uses the CPU's dedicated AES hardware during what was previously dead time, and also improves data entropy before scratchpad writes.

v2: The Gap is Gone
CPU
Compute (384 instructions)
16× AES mixing
Memory
Fetching next dataset (~50 ns)

Same memory fetch time, but now the CPU stays busy until data arrives. No wasted cycles. No wasted cycles. No wasted energy.

The AES trick is especially clever: it doesn't just fill idle time - it also strengthens ASIC resistance. In v1, a custom chip could skip AES in the main loop because it was only used for scratchpad initialization. In v2, AES is woven directly into the VM's core register mixing, so any hardware implementation must include a full AES engine inside the RandomX virtual machine itself.

The Prefetch Distance Change

The second major optimization is about how far ahead the CPU looks when pre-loading data from RAM.

v1: Prefetch 1 Iteration Ahead

While running iteration N, the CPU requests the data for iteration N+1. On faster modern CPUs, this isn't enough lead time - the data may not arrive before it's needed, causing a stall.

v2: Prefetch 2 Iterations Ahead

While running iteration N, the CPU requests data for iteration N+2. This gives DRAM twice as long to deliver, virtually guaranteeing the data is ready when needed - even on the fastest modern processors.

Prefetch Timeline

v1 (1 iteration lookahead):

Iter N
prefetch N+1
Iter N+1
data may not be ready!
Iter N+2

v2 (2 iteration lookahead):

Iter N
prefetch N+2
Iter N+1
prefetch N+3
Iter N+2
data ready ✓

By requesting data two iterations in advance, the memory system has twice as long to deliver - eliminating stalls on fast modern CPUs.

The CFROUND Optimization

A third, more subtle change targets floating-point performance. The CFROUND instruction controls the CPU's rounding mode for floating-point math. In v1, it executed frequently - but changing the rounding mode is expensive on modern CPUs because it forces the processor to flush its instruction pipeline.

In v2, CFROUND becomes conditional, executing only 1/16th of the time. This dramatically reduces pipeline flushes. On AMD Zen 1 CPUs, this single change yielded an 8.4% hashrate increase, with 5–10% gains expected across AMD architectures.

Pipeline Flush Frequency

v1 - CFROUND fires often (every ~16 instructions):

FLUSH
FLUSH
FLUSH
FLUSH

Each red bar = pipeline flush. CPU throws away in-progress work and restarts.

v2 - CFROUND conditional (1/16 chance):

FLUSH

Rarely interrupted. The CPU's pipeline stays full almost all the time.

Real-World Benchmark Results

The total operations per hash tell the story: v1 performs 4,456,448 total ops (4.2M VM + 262K AES), while v2 performs 6,815,744 total ops (6.3M VM + 524K AES) - a 52.9% increase in work per hash.

Raw hashrate (hashes/second) may drop slightly on some CPUs, but the work per joule - the metric that actually matters - improves dramatically:

Processor Arch Hashrate Change Work/Joule Improvement
Ryzen AI 9 365 Zen 5 +9.2% +67.0%
Ryzen AI 9 HX 370 Zen 5 +8.0% +65.1%
Ryzen 7 1700X Zen 1 +0.8% +54.1%
Core i9-12900K Alder Lake -3.9% +47.0%
Ryzen 9 7945HX Zen 4 -5.1% +45.2%
Ryzen 9 3950X Zen 2 -7.9% +40.9%
Ryzen 5 8600G Zen 4 -8.5% +39.9%
Ryzen 9 5950X Zen 3 -12.5% +38.2%
Ryzen 7 3700X Zen 2 -14.7% +30.5%
Core i7-8650U Kaby Lake-R -22.7% +18.2%
Core i7-6820HQ Skylake -24.4% +15.6%
Work/Joule Improvement by CPU (VM+AES ops per Joule)
Ryzen AI 9 365 (Zen5)
+67.0%
Ryzen AI 9 HX 370 (Zen5)
+65.1%
Ryzen 7 1700X (Zen1)
+54.1%
Core i9-12900K (Alder)
+47.0%
Ryzen 9 7945HX (Zen4)
+45.2%
Ryzen 9 3950X (Zen2)
+40.9%
Ryzen 5 8600G (Zen4)
+39.9%
Ryzen 9 5950X (Zen3)
+38.2%
Ryzen 7 3700X (Zen2)
+30.5%
Core i7-8650U (KBL-R)
+18.2%
Core i7-6820HQ (SKL)
+15.6%

Every tested CPU shows a significant efficiency gain - newer architectures benefit the most.

Why "work per joule" matters more than hashrate: A hash in v2 proves 53% more computational work than a hash in v1. Even if your CPU produces fewer hashes per second, it's doing more total work - and doing it more efficiently per watt of power consumed. For miners, this means lower electricity bills for the same level of network security contribution.

The Bigger Picture: ASIC Resistance

All of these changes share a common theme: making it harder to build a shortcut.

The net effect: a CPU designed for everyday computing (browsing, compiling, gaming) is also the ideal machine for RandomX v2. No specialized hardware can do it significantly better.

Layers of ASIC Resistance in v2
Conditional CFROUND (flexible FP pipeline required)
Deeper Prefetch (real DRAM behavior required)
AES in Main Loop (full AES engine in VM)
384-Instruction Programs
Complex execution engine needed

An ASIC must defeat every layer to gain an advantage.
Each layer pushes the design closer to a general-purpose CPU.

Sources & Further Reading