Most modern CPUs are multi-core – it is not easy to find even a laptop with less than two cores these days. And for desktops, 4 cores are usual now.
Password recovery is one of most CPU-intensive tasks, and it fits best into multi-processor architecture. Every CPU (or CPU core) get its own portion of passwords to try (i.e. to check their validness), and they all work in parallel. As simple as that.
So what we’re doing in our software is running multiple threads – as many as the number of CPUs (or cores) available. And the rest is being done by the operating system, that assigns the threads to cores (well, in most cases we don’t care what particular core is going to execute a particular thread, because they are all equal; the only exception is when one or more of the cores is doing something already, I mean something CPU-intensive as well).
There is also such technology as Hyper-threading. With it, CPU exposes two virtual cores for each physical one it has, allowing operating system to run more threads simultaneously thus better utilizing CPU resources.
Now how it looks in practice. We have built the system based on one of the top Intel desktop CPUs – Intel Core i7-970. It has as many as 6 (yes, six!) cores running at 3.2 GHz, plus hyper-threading, so the number of virtual processors is twelve. This CPU is produced using (relatively new) 32 nm process, so power consumption is surprisingly not very high (130 W), but well designed cooling is still strictly recommended (even if you’re not playing the overclocker’s game).
Obviously, there is no reason to run more than 12 compute-intensive threads on this CPU, while the minimum number of threads is just one (you bet! 🙂 ); we have not tried ALL variations from 1 to 12, but just some. What we were doing on this number-crunching twelve-heads monster is cracking the Blackberry backup file (using Elcomsoft Phone Password Breaker, of course). Here are the results:
The numbers are thousands passwords per second – so yes, the maximum performance we have got is well over 7 million (passwords per second). That means if the password is 7 chars long and contains small and capital letters, it will be cracked in a day and a half. Or if small (or capital) letters plus digits – less than 3 hours. And as you can see, the speed increases absolutely linearly up to 6 threads (according to the number of physical, not virtual cores). But when we double the number of threads (up to 12), the speed increases from 6,44 million passwords per second to 7,44 million per second only. So just 15% performance increase – but I wanted to remind you that the number of ALUs (arithmetic logic units) is still six. So hyper-threading does not double the speed, but still helps.
But 12 (virtual) cores is not the only strong side of this CPU. Also, it has the Westmere microarchitecture. That means (above many other things) that it features the new instruction set called Intel Advanced Encryption Standard Instruction Set (AES-NI), which is intended for hardware acceleration of AES. And as far as password verification for Blackberry backups uses AES, we do have hardware acceleration in EPPB with AES-NI (when it is supported by the hardware) as well. The results are (for one and twelve threads):
The first bar shows the speed with “old style” code (SSE2, in fact), and the second one with AES-NI. So the speed improvement is about 1.5 times – may be it sounds not so impressive as GPU acceleration, but it is just a free “bonus”! If, of course, you have an appropriate processor, in particular one of these:
• Gulftown (Core i7-9xx, Xeon 36xx, Xeon 56xx)
• Clarkdale (except Core i3, so just Core i5-6xx, and some Xeons)
• Arrandale (except Core i3 and Core i5-4xxM)
So, to make it simpler: most Core i5 processors have AES-NI supported, as well as six-core Intel Core i7 (9xx). If you have one of those, you can now break Blackberry backups about 1.5 times faster than you thought :).
There is one [minor] problem, though. If the time needed to verify the password is comparable to the time needed to generate the [next] password, the performance start to drop. With the brute-force attack, it starts with 8 million passwords per second; with the dictionary attack much earlier – even with 12 threads, the resulting speed is about 2 million passwords per second only. The reason is pretty simple: we are not able to generate passwords that fast, especially when we perform all those nice mutations of wordlists passwords (changing the letter case, adding or replacing symbols etc). CPU verifies the password faster than we provide it with the new one :).