13 Years of GPU Acceleration

October 22nd, 2020 by Oleg Afonin
Category: «Clouds», «General», «GPU acceleration»

Today, we have an important date. It’s been 13 years since we invented a technique that reshaped the landscape of modern password recovery. 13 years ago, we introduced GPU acceleration in our then-current password recovery tool, enabling the use of consumer-grade gaming video cards for breaking passwords orders of magnitude faster.

With today’s proliferation of everything AI relying on GPU units our 13-year-old achievement seems obvious in retrospect. Back then, it was not only far from the obvious, but took us years to design and to implement. It was so innovative that TheInquirer wrote an article about our tool. The original is no longer available, but we saved a quote:

PESKY RUSSIANS have come up with a novel way of using Nvidia’s graphics hardware – cracking passwords.
INQ readers will long be familiar with the concept of the GPGPU, and the green team’s latest CUDA development kit. Folks batting for the greens have been lauding the processing power of the 8800 when it comes to complex oil, gas and financial simulations.
But those crazy Ruskis have come up with a use that is rather more nefarious. Elcomsoft, based in Moscow, has created a password cracking technique that uses the same parallel processing concepts to speed up dictionary and brute force attacks on things like Windows Vista password logins. The firm says that the 8800 is up to 25 times faster than a CPU, normally used for such tasks.

Why would anyone want to use a video card to break passwords? It’s a matter of fact that modern software (at least in most cases, with several notable exceptions) uses strong encryption to protect the data. Hundreds of thousands rounds of hashing are common to derive the encryption key from the password, making such passwords extremely difficult and time-consuming to break. The resources of the internal CPU are quickly exhausted when attacking passwords. This is expected behavior, as manufacturers try timing the strength of their security settings such that it only takes a split (but noticeable) second on an “average” computer to unlock the encrypted file, document, database or disk volume.

These “split seconds” accumulate quickly, making the CPU choke with trillions of instructions. While there is only so much time the user is willing to wait for a password-protected document to open or an encrypted volume to mount, the delay of about 0.3 seconds is generally viewed as acceptable. It is important to note that that every OEM makes use of the computer’s CPU exclusively when it comes to verifying the password. Moreover, they are aiming at an “average” system, that has usually an old and almost never a high-end CPU. On experts’ computers, which are modern and utilize high-end hardware, It is not uncommon to see a high-end CPU crunching passwords at the speed of around 10 to 15 passwords per second. This was the case 13 years ago with the software used back then, and this is the case today with modern software. Obviously, these kind of speeds are far from acceptable.

This is where we needed a new technology to break passwords faster. When it comes to the choice of hardware that can be utilized to speed up the recovery, there aren’t really a lot of options available. Supercomputers are great, but they are more of a theoretical possibility for a humble law enforcement expert. FPGA-based solutions (e.g. Tableau) are better. Specifically tailored to these tasks, they demonstrated commendable performance with high reliability and low power consumption. The problem? Lots of them. Non-standard software APIs, highly custom everything and the high price point made these dedicated adapters pale in comparison to… Video cards.

Yes, your regular gaming video cards.

Video cards can break passwords

The speed of a single CPU (or multiple CPUs if that matters) is not nearly enough to break today’s passwords. The hundreds of thousands or even millions hash iterations slow down the recovery to the crawl. In the end, the CPU becomes a bottleneck. 13 years ago, we saw the need for additional computation power. Lots of it.

At the time, video cards were mostly used for gaming. The then-current video cards such as the NVIDIA GeForce 8500GT were only usable for gaming.

Then, also in 2007, NVIDIA created CUDA.

CUDA is a hardware-accelerated parallel computing platform that offers the developers an API, which, in turn, enables the use of GPU cores for general-purpose computing.

GPUs had evolved into highly parallel multi-core systems, allowing very efficient manipulation of large blocks of data. This design is more effective than general-purpose central processing unit (CPUs) for algorithms in situations where processing large blocks of data is done in parallel. (Source)

Processing large blocks of data is exactly what we needed. This is where we started thinking about using (at the time, only NVIDIA) video cards to accelerate password recovery. With GPU acceleration offloading the most computational-intensive calculations onto the highly scalable video cards, we hoped to achieve at least 10x the performance of a CPU. The reality far exceeded our expectations. The hundreds of GPU cores, used in parallel, were able to deliver the speed exceeding the metrics of a high-end CPU by the factor of 100x to 250x depending on the format.

Rest assured, the journey was neither easy nor straightforward. The most difficult thing to do was to break down the job into tiny pieces, then correctly parallelizing each piece on the hundreds of individual computational units. There are thousands of processors in a single video card, and one must selecting the number of threads depending on the specific processor in the correct manner.

If the GPU is vastly outperforming the CPU, the latter becomes a bottleneck. If the given algorithm for a certain format is too fast for the CPU to handle, then the potential passwords must be generated directly on the GPU itself. If that is not done, the CPU chokes, while the GPU idles. This was one of the most difficult things to do. Making the GPU calculate a hash value is one thing; making it do the various rules, mutations and perform other complex rules on which the next password in the pipeline will be based is quite another.

AMD, Intel and OpenCL

Then came AMD support. Back then, the technology was still owned by the company named ATI (now part of AMD). We were the first on the market to support AMD boards, first via their native API.

Then things began standardizing. In 2015, NVIDIA added OpenCL support to their range of CUDA-capable video cards. This trend was backed by other players including AMD and Intel. OpenCL is an open standard. Its cross-platform framework covers heterogeneous code execution on multiple CPU and GPU units, as well as some DSP and FPGA hardware that could be previously addressed exclusively via their own private APIc. For us, OpenCL became a standard, cross-platform interface for parallel computing using AMD, NVIDIA and Intel cards.

We implemented OpenCL for AMD boards and never looked back. Then we added support for Intel integrated GPU units one can find in most Intel processors. These GPUs are not very powerful, but they do speed up calculations about 2 to 2.5 times compared to not using the internal graphics.

After the initial excitement was over, we compared the performance of our OpenCL implementation on NVIDIA boards to what we had with CUDA, and found that CUDA-based acceleration gave us the performance benefit compared to OpenCL in the low tens of percentage points. This, thanks to the tight cooperation with NVIDIA engineers who had helped us throughout these years to optimize our code on their cards. So we went back to CUDA for NVIDIA boards, and are using OpenCL for everything else.

Utilizing old video cards

What if you have an existing video card installed, but want to upgrade to a newer and higher performing one? The constant upgrade cycle must come to an end. Five years ago, back in 2015, we released a solution for mixing different video cards: Breaking the Vicious Cycle of Hardware Upgrades.

What is this all about? Our new invention allows using multiple video cards together if they are running at different clock speeds, and even if they are different video cards altogether. With asynchronous computing, the password recovery tool can break jobs into smaller pieces, and feed every piece to a given video card. The asynchronous scheduler does not have to wait for a given part of the job to complete before feeding the next piece in line once one of the video cards finishes its slice.

In layman terms, this technology allows using multiple video cards of different makes and models, effectively utilizing existing hardware and squeezing the last bit of performance out of every supported component.

Our tools can utilize all GPU cores in mix-and-match scenarios if the video cards are made by different manufacturers. Whether you have a mix of AMD and NVIDIA boards or just want to make use of your computer’s built-in Intel HD Graphics cores, all of these can be used together to speed up the recovery.

How much of a benefit would an old video card provide? It depends on how old it is. Let us see the following two benchmarks:

As you can see, a 7-year-old GeForce GTX 750Ti offers the performance of some 2000 RAR5 passwords per second. By directly comparing it to the previous-generation flagship GeForce GTX 1080 we can see a 16-fold improvement with the new card’s 32,700 passwords per second. This season’s RTX 2070 with its 50,400 passwords per second is 25 times faster. Interestingly, the then top of the line Intel Xeon E5 2603 CPU (116 passwords/s) is only 4 times slower than the current consumer-grade Intel Core i7-9700K (481 password/s).

Think about it for a moment. 7 years of progress in CPU made the recovery speed 4 times better. The same 7 years in the GPU land brought us a 25-fold speed increase. Conclusion: always invest in a GPU, not CPU.

GPU in the cloud

The parallel use of multiple video cards is great if you have a data center equipped with up to date hardware. New users can experience a sticker shock when building a new system stocked with the maximum number of high-end video cards. If you rarely break passwords, you may want to consider offloading the job to the cloud. Recently, cloud service providers started equipping their virtual instances with GPU accelerators. For example, with Amazon P2, you can spec up to 8 GPUs per instance. Our tools make it easy to deploy in the cloud; read Breaking Passwords in the Cloud: Using Amazon P2 Instances for details.

Other uses for GPU acceleration

GPU acceleration is not exclusive to password recovery; far from it. Today, GPU acceleration is used for many vastly different purposes. The incomplete list of some of the most interesting non-traditional uses for GPUs is below.

  • Mining cryptocurrencies. As controversial as it is, miners of various cryptocurrencies are among the most active users of gaming video cards.
  • Computational photography. Image stacking, ultra-high resolution imaging, HDR processing, shake reduction and even zooming are made possible by offloading parts of the processing onto GPU units.
  • AI-based still image and movie superscaling. Old, low-resolution images and some types of movies can be superscaled to UHD resolution using highly convincing AI-based processing.
  • Accelerated rendering of 3D graphics, face recognition and SfM (Structure from Motion) processing.
  • Accelerated encryption, decryption and compression. We are yet to see a general-use archiver that would use GPU acceleration for faster compression, but the technology is there.
  • Scientific purposes. These include bioinformatics, molecular dynamics, SETI@home, medical analysis and physical simulations, and a lot more.

REFERENCES:

Elcomsoft Distributed Password Recovery

Build high-performance clusters for breaking passwords faster. Elcomsoft Distributed Password Recovery offers zero-overhead scalability and supports GPU acceleration for faster recovery. Serving forensic experts and government agencies, data recovery services and corporations, Elcomsoft Distributed Password Recovery is here to break the most complex passwords and strong encryption keys within realistic timeframes.

Elcomsoft Distributed Password Recovery official web page & downloads »