Mining for Tor v3 onions in the cloud

Tor supports a new hidden service protocol as of v0.3.2.1-alpha, released back in October 2017, and is now in stable branches. Dubbed the "v3" onion service protocol, among other changes, it replaces SHA1/DH/RSA1024 with SHA3/ed25519/curve25519 for much improved cryptographic security.

I already had a v2 onion site up at tbrindus6tjv6wpi.onion, so I thought it would be an interesting exercise to mine a v3 vanity domain prefixed with tbrindus. For this, I set up 15 servers to mine for a matching prefix — more on this below!

It took well over a week of mining, but as of today, this site can also be accessed through the v3 hidden service tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion!

A bit of background

Tor hidden service domain "names" aren't really domain names as most are used to. You can enter them in your (Tor) browser, but you can't buy a particular domain you want — a hidden service hostname is a prefix of the base32-encoded public key of the service.

If you want a particular onion, you must randomly generate billions of keys until one happens to hash into a string starting with the prefix you're looking for. In the case of tbrindus, an 8-letter prefix, there are $32^8 = 1\,099\,511\,627\,776$ possible combinations. Every additional letter increases the space (and hence expected computation time) by a factor of 32.

V2 onions have been around for a long time, so there exist GPU-based miners like Scallion which can hash at frightening (several gigahashes a second) rates. In fact, Scallion was used to brute force 32-bit GPG key ids to demonstrate that 32-bit ids are insecure (evil32.com for more on that).

Tor's switch to ed25519 means that existing tools for generating vanity names like Scallion can't be used — at the time of writing, the best bet for v3 vanity names is mkp224o, a CPU-based miner.

I expected mkp224o to be orders of magnitudes slower than GPU-based mining, so I spun up 15 servers across several providers (I'm looking for a new host, and thought this would be a good opportunity to test some new ones out).

Setting up the servers

Getting mkp224o set up and running is fairly simple. On most development machines you'd probably have everything required preinstalled, with perhaps the exception of libsodium-dev.

On a typical Debian-based distro, you can get everything you need to get running with:

$ apt install autoconf build-essential git libsodium-dev
$ git clone https://github.com/cathugger/mkp224o.git
$ cd mkp224o
$ ./autogen.sh
$ ./configure # see below
$ make

For ARM servers, I passed --enable-donna to configure, while for x86_64 boxes I used either --enable-amd64-51-30k or --enable-amd64-64-24k, whichever provided the greatest hashrate.

For mining, I specified a filter for tbrindus:

$ ./mkp224o -s -T tbrindus

…and waited. I waited a long time.

Mining results

V2 onions can be hashed incredibly fast on common GPUs with Scallion, with many cards capable of several gigahashes per second. On my laptop's GTX 960M, Scallion pulled in 1 GH/s, and mined tbrindus6tjv6wpi.onion in under 10 minutes.

For comparison, the 15 servers I ran mkp224o on for 6 days pulled in an aggregate 5 MH/s, or 0.5% of what my fairly standard laptop graphics card can compute.

Below, I've put together a table of the setups I ran to compute tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion.

Host Plan OS CPU RAM Hashes/s Contrib.
Scaleway1 C2S Debian 9.0 4x Intel Atom C2550 @ 2.3GHz 8GB 229,400 4.76%
Scaleway ARM64-16GB Debian 9.0 16x ARMv8 Cavium ThunderX 16GB 1,300,000 26.97%
Scaleway ARM64-8GB Ubuntu 16.04 8x ARMv8 Cavium ThunderX 8GB 626,000 12.99%
Scaleway ARM64-2GB Ubuntu 16.04 4x ARMv8 Cavium ThunderX 2GB 314,000 6.51%
Scaleway2 ARM64-2GB Debian 9.3 4x ARMv8 Cavium ThunderX 2GB 218,000 4.52%
Scaleway C1 Debian 9.0 2x Intel Atom C2750 @ 2.3GHz 2GB 113,500 2.35%
DigitalOcean Compute 4GB Debian 9.4 2x Intel Xeon E5-2697A v4 @ 2.5GHz 4GB 470,000 9.75%
Azure Standard B2s Ubuntu 16.04 2x Intel Xeon E5-2673 v4 @ 2.294GHz 4GB 68,000 1.41%
Azure Standard B2s Debian 9.3 2x Intel Xeon E5-2673 v4 @ 2.294GHz 4GB 80,000 1.66%
Azure Standard B2s FreeBSD 11.1 2x Intel Xeon E5-2673 v4 @ 2.294GHz 4GB 69,000 1.43%
SSDNodes3 8GB KVM Debian 9.3 2x Intel (Skylake, IBRS) @ 2.299GHz 8GB 274,500 5.69%
SSDNodes3 16GB KVM Debian 9.3 4x Intel (Skylake, IBRS) @ 2.299GHz 16GB 540,000 11.20%
SSDNodes 8GB Container Debian 9.4 4x Intel Xeon E5-2697 v3 @ 766MHz 8GB 78,000 1.62%
4 Raspberry Pi 3 Raspbian 9.1 4x ARM Cortex-A53 @ 1.2GHz 1GB 70,000 1.45%
4 Optiplex 960 Ubuntu 16.04 4x Intel 2 Quad Q9400 @ 2.659GHz 4GB 370,000 7.68%
4,820,400 100.00%

1. This was a dedicated machine.

2. This machine was provisioned with the same specs as the other ARM64-2GB instance, but was also running a Tor relay, which explains the difference in hashrate.

3. CPU steal time on these machines was constantly at 20% or higher.

4. I ran these machines uninterrupted at home.

A quick statistical analysis

OK, so it took a long time. I accumulated far more in server expenses than I had originally planned on, but at least I got a sense of pride and accomplishment from it.

The search for a hash prefix of tbrindus is probabilistic and memoryless: you never get "closer" to mining a hash; every hash has an equal probability $\frac 1 {32^{\text{length(prefix)}}} = \frac 1 {32^8}$ of matching. Since it's essentially a Poisson process, and we can use an exponential distribution to estimate how long it takes, on average, for a match to be found.

The CDF of an exponential distribution has the form $1 - e^{-\lambda x}$.

We can perform 4,820,400 hashes per second (86,400 seconds in a day) with each hash having a probability of $\frac 1 {32^8}$, so we can determine the probability that we'll find a match in $x$ days (let's call it $f(x)$ for simplicity) by taking $\lambda = \frac{86\,400 \times 4\,820\,400}{32^8}$.

Since I like graphs, let's graph this function.

The expected value of an exponential distribution is given by $\frac 1 \lambda$, so we can take this and plug in our $\lambda$ to find out the expected number of days for generating a prefix of 8 characters:

Alright, so I definitely overshot that.

Bonus: UnixBench of the servers

Since I had all these servers up and running already, I figured it'd be interesting to compare UnixBench scores to see how they correlated to hashrate. In the table below, I've included the hashrate of several servers I was particularly interested in, as well as their single core and multi-core performance determined by running UnixBench on an unloaded system.

Host Plan OS Hashes/s Num. Cores Single core perf. Multi-core perf.
Scaleway ARM64-16GB Debian 9.0 1,300,000 16 401.2 1641.6
Scaleway ARM64-8GB Ubuntu 16.04 LTS 626,000 8 380.5 1514.1
Scaleway ARM64-2GB Ubuntu 16.04 LTS 314,000 4 400.9 1020.3
Scaleway C1 Debian 9.0 113,500 2 621.0 1047.7
Azure Standard B2s Ubuntu 16.04 68,000 2 472.2 340.0
SSDNodes 16GB KVM Debian 9.3 540,000 4 472.3 1363.2
SSDNodes 8GB KVM Debian 9.3 274,500 4 616.8 1382.8

I've also attached the raw UnixBench logs below, for convenience.

Scaleway — ARM64-16GB
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.23-std-1 -- #1 SMP Mon Apr 24 13:18:14 UTC 2017
   Machine: aarch64 (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   05:13:38 up 3 days,  1:08,  1 user,  load average: 11.74, 15.14, 15.76; runlevel 2018-03-15

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:13:38 - 05:41:33
16 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8372406.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1825.0 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1014.4 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        181638.7 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           51750.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        422317.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                              476739.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  29308.4 lps   (10.0 s, 7 samples)
Process Creation                               2046.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2597.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1107.5 lpm   (60.0 s, 2 samples)
System Call Overhead                         863802.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8372406.5    717.4
Double-Precision Whetstone                       55.0       1825.0    331.8
Execl Throughput                                 43.0       1014.4    235.9
File Copy 1024 bufsize 2000 maxblocks          3960.0     181638.7    458.7
File Copy 256 bufsize 500 maxblocks            1655.0      51750.8    312.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     422317.9    728.1
Pipe Throughput                               12440.0     476739.6    383.2
Pipe-based Context Switching                   4000.0      29308.4     73.3
Process Creation                                126.0       2046.2    162.4
Shell Scripts (1 concurrent)                     42.4       2597.0    612.5
Shell Scripts (8 concurrent)                      6.0       1107.5   1845.8
System Call Overhead                          15000.0     863802.9    575.9
                                                                   ========
System Benchmarks Index Score                                         401.2

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:41:33 - 06:09:37
16 CPUs in system; running 16 parallel copies of tests

Dhrystone 2 using register variables      132993486.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    29057.8 MWIPS (10.0 s, 7 samples)
Execl Throughput                               7995.3 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        137360.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           29373.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        630759.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                             7424668.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 401144.7 lps   (10.0 s, 7 samples)
Process Creation                              10546.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  15213.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2003.4 lpm   (60.2 s, 2 samples)
System Call Overhead                        1277419.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  132993486.5  11396.2
Double-Precision Whetstone                       55.0      29057.8   5283.2
Execl Throughput                                 43.0       7995.3   1859.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     137360.5    346.9
File Copy 256 bufsize 500 maxblocks            1655.0      29373.3    177.5
File Copy 4096 bufsize 8000 maxblocks          5800.0     630759.5   1087.5
Pipe Throughput                               12440.0    7424668.0   5968.4
Pipe-based Context Switching                   4000.0     401144.7   1002.9
Process Creation                                126.0      10546.1    837.0
Shell Scripts (1 concurrent)                     42.4      15213.0   3588.0
Shell Scripts (8 concurrent)                      6.0       2003.4   3339.0
System Call Overhead                          15000.0    1277419.8    851.6
                                                                   ========
System Benchmarks Index Score                                        1641.6
Scaleway — ARM64-8GB
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.4.121-mainline-rev1 -- #1 SMP Sun Mar 11 16:44:34 UTC 2018
   Machine: aarch64 (aarch64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   05:13:17 up 2 days, 53 min,  1 user,  load average: 5.56, 7.47, 7.82; runlevel 2018-03-16

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:13:17 - 05:41:24
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8502417.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1741.0 MWIPS (10.1 s, 7 samples)
Execl Throughput                               1112.8 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        165427.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           54377.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        343939.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                              462211.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  14746.0 lps   (10.0 s, 7 samples)
Process Creation                               2370.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2677.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1050.2 lpm   (60.0 s, 2 samples)
System Call Overhead                         998124.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8502417.0    728.6
Double-Precision Whetstone                       55.0       1741.0    316.6
Execl Throughput                                 43.0       1112.8    258.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     165427.5    417.7
File Copy 256 bufsize 500 maxblocks            1655.0      54377.8    328.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     343939.2    593.0
Pipe Throughput                               12440.0     462211.7    371.6
Pipe-based Context Switching                   4000.0      14746.0     36.9
Process Creation                                126.0       2370.8    188.2
Shell Scripts (1 concurrent)                     42.4       2677.5    631.5
Shell Scripts (8 concurrent)                      6.0       1050.2   1750.4
System Call Overhead                          15000.0     998124.5    665.4
                                                                   ========
System Benchmarks Index Score                                         380.5

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:41:24 - 06:09:38
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables       67785992.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    13990.1 MWIPS (10.1 s, 7 samples)
Execl Throughput                               5098.5 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        285233.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           73046.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1005166.1 KBps  (30.0 s, 2 samples)
Pipe Throughput                             3663311.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 222918.1 lps   (10.0 s, 7 samples)
Process Creation                               8125.0 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  10717.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1391.7 lpm   (60.2 s, 2 samples)
System Call Overhead                        3636949.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   67785992.9   5808.6
Double-Precision Whetstone                       55.0      13990.1   2543.7
Execl Throughput                                 43.0       5098.5   1185.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     285233.4    720.3
File Copy 256 bufsize 500 maxblocks            1655.0      73046.0    441.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1005166.1   1733.0
Pipe Throughput                               12440.0    3663311.5   2944.8
Pipe-based Context Switching                   4000.0     222918.1    557.3
Process Creation                                126.0       8125.0    644.8
Shell Scripts (1 concurrent)                     42.4      10717.2   2527.6
Shell Scripts (8 concurrent)                      6.0       1391.7   2319.6
System Call Overhead                          15000.0    3636949.3   2424.6
                                                                   ========
System Benchmarks Index Score                                        1514.1
Scaleway — ARM64-2GB
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.4.121-mainline-rev1 -- #1 SMP Sun Mar 11 16:44:34 UTC 2018
   Machine: aarch64 (aarch64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   05:14:10 up 3 days,  7:45,  1 user,  load average: 2.75, 3.74, 3.91; runlevel 2018-03-14

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:14:10 - 05:42:12
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8555429.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1747.9 MWIPS (10.1 s, 7 samples)
Execl Throughput                               1224.4 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        184524.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           58246.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        438788.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                              465226.2 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  14792.3 lps   (10.0 s, 7 samples)
Process Creation                               2629.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3095.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    884.2 lpm   (60.0 s, 2 samples)
System Call Overhead                        1011139.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8555429.5    733.1
Double-Precision Whetstone                       55.0       1747.9    317.8
Execl Throughput                                 43.0       1224.4    284.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     184524.9    466.0
File Copy 256 bufsize 500 maxblocks            1655.0      58246.7    351.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     438788.5    756.5
Pipe Throughput                               12440.0     465226.2    374.0
Pipe-based Context Switching                   4000.0      14792.3     37.0
Process Creation                                126.0       2629.9    208.7
Shell Scripts (1 concurrent)                     42.4       3095.2    730.0
Shell Scripts (8 concurrent)                      6.0        884.2   1473.6
System Call Overhead                          15000.0    1011139.0    674.1
                                                                   ========
System Benchmarks Index Score                                         400.9

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:42:12 - 06:10:18
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       34136207.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     6989.1 MWIPS (10.2 s, 7 samples)
Execl Throughput                               3526.3 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        218968.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           61412.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        830973.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1848545.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 121851.3 lps   (10.0 s, 7 samples)
Process Creation                               6271.4 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7046.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    955.8 lpm   (60.1 s, 2 samples)
System Call Overhead                        3570647.2 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   34136207.1   2925.1
Double-Precision Whetstone                       55.0       6989.1   1270.7
Execl Throughput                                 43.0       3526.3    820.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     218968.8    553.0
File Copy 256 bufsize 500 maxblocks            1655.0      61412.5    371.1
File Copy 4096 bufsize 8000 maxblocks          5800.0     830973.8   1432.7
Pipe Throughput                               12440.0    1848545.0   1486.0
Pipe-based Context Switching                   4000.0     121851.3    304.6
Process Creation                                126.0       6271.4    497.7
Shell Scripts (1 concurrent)                     42.4       7046.2   1661.8
Shell Scripts (8 concurrent)                      6.0        955.8   1593.0
System Call Overhead                          15000.0    3570647.2   2380.4
                                                                   ========
System Benchmarks Index Score                                        1020.3
Scaleway — C1
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.20-std-1 -- #1 SMP Tue Apr 4 12:56:17 UTC 2017
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz (4787.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz (4787.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:14:11 up 3 days,  1:28,  1 user,  load average: 2.01, 2.14, 2.06; runlevel 2018-03-15

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:14:12 - 05:42:08
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       12323865.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2014.1 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1223.1 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        415672.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          120361.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        985611.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1170708.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  46541.0 lps   (10.0 s, 7 samples)
Process Creation                               3049.4 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3348.8 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    685.8 lpm   (60.1 s, 2 samples)
System Call Overhead                        1446516.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   12323865.3   1056.0
Double-Precision Whetstone                       55.0       2014.1    366.2
Execl Throughput                                 43.0       1223.1    284.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     415672.5   1049.7
File Copy 256 bufsize 500 maxblocks            1655.0     120361.9    727.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     985611.5   1699.3
Pipe Throughput                               12440.0    1170708.3    941.1
Pipe-based Context Switching                   4000.0      46541.0    116.4
Process Creation                                126.0       3049.4    242.0
Shell Scripts (1 concurrent)                     42.4       3348.8    789.8
Shell Scripts (8 concurrent)                      6.0        685.8   1142.9
System Call Overhead                          15000.0    1446516.0    964.3
                                                                   ========
System Benchmarks Index Score                                         621.0

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:42:08 - 06:10:06
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       24552470.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4016.2 MWIPS (10.0 s, 7 samples)
Execl Throughput                               2918.3 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        485532.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          131304.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1365028.4 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2329059.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 116038.9 lps   (10.0 s, 7 samples)
Process Creation                               7104.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   5589.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    722.6 lpm   (60.1 s, 2 samples)
System Call Overhead                        2260798.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   24552470.3   2103.9
Double-Precision Whetstone                       55.0       4016.2    730.2
Execl Throughput                                 43.0       2918.3    678.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     485532.6   1226.1
File Copy 256 bufsize 500 maxblocks            1655.0     131304.9    793.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1365028.4   2353.5
Pipe Throughput                               12440.0    2329059.7   1872.2
Pipe-based Context Switching                   4000.0     116038.9    290.1
Process Creation                                126.0       7104.8    563.9
Shell Scripts (1 concurrent)                     42.4       5589.5   1318.3
Shell Scripts (8 concurrent)                      6.0        722.6   1204.3
System Call Overhead                          15000.0    2260798.6   1507.2
                                                                   ========
System Benchmarks Index Score                                        1047.7
Azure — Standard B2S
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.13.0-1011-azure -- #14-Ubuntu SMP Thu Feb 15 16:15:39 UTC 2018
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:22:33 up 6 days,  8:37,  1 user,  load average: 0.08, 0.62, 1.38; runlevel 2018-03-11

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:22:33 - 05:50:38
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       28065805.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3310.3 MWIPS (8.7 s, 7 samples)
Execl Throughput                               2546.1 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        257690.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           55889.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        535177.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                              315663.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  25281.3 lps   (10.0 s, 7 samples)
Process Creation                               3911.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2343.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    862.3 lpm   (60.0 s, 2 samples)
System Call Overhead                         268361.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   28065805.5   2405.0
Double-Precision Whetstone                       55.0       3310.3    601.9
Execl Throughput                                 43.0       2546.1    592.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     257690.1    650.7
File Copy 256 bufsize 500 maxblocks            1655.0      55889.7    337.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     535177.7    922.7
Pipe Throughput                               12440.0     315663.7    253.7
Pipe-based Context Switching                   4000.0      25281.3     63.2
Process Creation                                126.0       3911.9    310.5
Shell Scripts (1 concurrent)                     42.4       2343.0    552.6
Shell Scripts (8 concurrent)                      6.0        862.3   1437.2
System Call Overhead                          15000.0     268361.9    178.9
                                                                   ========
System Benchmarks Index Score                                         472.2

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:50:38 - 06:18:55
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       12561408.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1364.4 MWIPS (10.5 s, 7 samples)
Execl Throughput                               1285.0 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        108284.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           29067.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        813617.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                              195193.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  59307.3 lps   (10.0 s, 7 samples)
Process Creation                               2751.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3681.4 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    322.3 lpm   (60.1 s, 2 samples)
System Call Overhead                         280762.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   12561408.5   1076.4
Double-Precision Whetstone                       55.0       1364.4    248.1
Execl Throughput                                 43.0       1285.0    298.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     108284.8    273.4
File Copy 256 bufsize 500 maxblocks            1655.0      29067.9    175.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     813617.5   1402.8
Pipe Throughput                               12440.0     195193.3    156.9
Pipe-based Context Switching                   4000.0      59307.3    148.3
Process Creation                                126.0       2751.5    218.4
Shell Scripts (1 concurrent)                     42.4       3681.4    868.3
Shell Scripts (8 concurrent)                      6.0        322.3    537.1
System Call Overhead                          15000.0     280762.9    187.2
                                                                   ========
System Benchmarks Index Score                                         340.0
SSDNodes — KVM 16GB
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.0-5-amd64 -- #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 2: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 3: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:32:38 up 24 days,  9:44,  2 users,  load average: 0.86, 0.95, 2.01; runlevel 2018-02-21

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:32:39 - 06:00:50
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       18638854.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3603.5 MWIPS (9.3 s, 7 samples)
Execl Throughput                                543.0 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        326203.0 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          107831.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        782124.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                              772372.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  27040.7 lps   (10.0 s, 7 samples)
Process Creation                               1912.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   1867.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    685.7 lpm   (60.1 s, 2 samples)
System Call Overhead                         603214.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   18638854.5   1597.2
Double-Precision Whetstone                       55.0       3603.5    655.2
Execl Throughput                                 43.0        543.0    126.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     326203.0    823.7
File Copy 256 bufsize 500 maxblocks            1655.0     107831.8    651.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     782124.6   1348.5
Pipe Throughput                               12440.0     772372.4    620.9
Pipe-based Context Switching                   4000.0      27040.7     67.6
Process Creation                                126.0       1912.9    151.8
Shell Scripts (1 concurrent)                     42.4       1867.0    440.3
Shell Scripts (8 concurrent)                      6.0        685.7   1142.9
System Call Overhead                          15000.0     603214.3    402.1
                                                                   ========
System Benchmarks Index Score                                         472.3

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 06:00:50 - 06:29:14
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       63227839.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    14671.5 MWIPS (9.4 s, 7 samples)
Execl Throughput                               4394.5 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        347374.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          109273.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        830966.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2702931.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 307088.8 lps   (10.0 s, 7 samples)
Process Creation                               4009.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6331.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1825.1 lpm   (60.1 s, 2 samples)
System Call Overhead                        2090415.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   63227839.0   5418.0
Double-Precision Whetstone                       55.0      14671.5   2667.5
Execl Throughput                                 43.0       4394.5   1022.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     347374.8    877.2
File Copy 256 bufsize 500 maxblocks            1655.0     109273.0    660.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     830966.2   1432.7
Pipe Throughput                               12440.0    2702931.3   2172.8
Pipe-based Context Switching                   4000.0     307088.8    767.7
Process Creation                                126.0       4009.3    318.2
Shell Scripts (1 concurrent)                     42.4       6331.9   1493.4
Shell Scripts (8 concurrent)                      6.0       1825.1   3041.8
System Call Overhead                          15000.0    2090415.5   1393.6
                                                                   ========
System Benchmarks Index Score                                        1363.2
SSDNodes — KVM 8GB
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.0-5-amd64 -- #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:27:18 up 24 days,  9:39,  2 users,  load average: 1.83, 2.76, 2.63; runlevel 2018-02-21

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:27:18 - 05:55:29
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       20712375.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4089.5 MWIPS (10.0 s, 7 samples)
Execl Throughput                                869.8 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        414717.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          118528.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1037781.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                              839599.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  39673.2 lps   (10.0 s, 7 samples)
Process Creation                               2367.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3917.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1015.1 lpm   (60.0 s, 2 samples)
System Call Overhead                         646058.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   20712375.2   1774.8
Double-Precision Whetstone                       55.0       4089.5    743.5
Execl Throughput                                 43.0        869.8    202.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     414717.4   1047.3
File Copy 256 bufsize 500 maxblocks            1655.0     118528.4    716.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1037781.8   1789.3
Pipe Throughput                               12440.0     839599.1    674.9
Pipe-based Context Switching                   4000.0      39673.2     99.2
Process Creation                                126.0       2367.3    187.9
Shell Scripts (1 concurrent)                     42.4       3917.3    923.9
Shell Scripts (8 concurrent)                      6.0       1015.1   1691.8
System Call Overhead                          15000.0     646058.8    430.7
                                                                   ========
System Benchmarks Index Score                                         616.8

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:55:29 - 06:23:42
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       38935462.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     8156.1 MWIPS (10.0 s, 7 samples)
Execl Throughput                               4726.3 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        692577.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          203840.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1799195.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1621602.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 211656.3 lps   (10.0 s, 7 samples)
Process Creation                               9135.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7138.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1195.9 lpm   (60.1 s, 2 samples)
System Call Overhead                        1202392.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38935462.3   3336.4
Double-Precision Whetstone                       55.0       8156.1   1482.9
Execl Throughput                                 43.0       4726.3   1099.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     692577.9   1748.9
File Copy 256 bufsize 500 maxblocks            1655.0     203840.1   1231.7
File Copy 4096 bufsize 8000 maxblocks          5800.0    1799195.8   3102.1
Pipe Throughput                               12440.0    1621602.5   1303.5
Pipe-based Context Switching                   4000.0     211656.3    529.1
Process Creation                                126.0       9135.5    725.0
Shell Scripts (1 concurrent)                     42.4       7138.5   1683.6
Shell Scripts (8 concurrent)                      6.0       1195.9   1993.2
System Call Overhead                          15000.0    1202392.1    801.6
                                                                   ========
System Benchmarks Index Score                                        1382.8

These benchmarks should be taken with a grain of salt, since UnixBench tests a fair bit more than just CPU throughput. However, what appears to be fairly clear is that though the ARMv8 cores are 20-30% slower than the mixture of competing x86_64 cores in a contest of single core performance, they win out in multi-core hashrate simply due to their number.

I suppose this isn't really a thrilling discovery — it makes immediate sense — but I found it fairly interesting that it's cheaper to scale out in number of cores rather than up in per-core performance… at least when it comes to mining vanity Tor domains.

Conclusion

Overall, this was a larger undertaking than I would have assumed at first, and I spent a long time monitoring (nonexistent) progress. In the end, it was fun to do, so hopefully it was fun to read about too!