In Alder Lake, Intel introduced hybrid architecture. Large, hyperthreading-enabled Performance cores are complemented with smaller, single-thread Efficiency cores. The host OS is responsible for assigning threads to one core or another. We discovered that Windows 10 scheduler is not doing a perfect job when it comes to password recovery, which requires a careful approach to thread scheduling.
Alder Lake is Intel’s codename for the 12th generation of Intel Core processors. The architecture introduced a new hybrid design combining high-performance P-cores and power-efficient E-cores in a single CPU. The P-cores come with hyperthreading support, and can simultaneously execute two threads, while the smaller E-cores lack hyperthreading. This makes for some unusual combinations. For example, an Intel Core i9-12900K CPU with 8 P-cores and 8 E-cores can run 24 threads in parallel. The low-end Core i3-12300 comes with 4 P-cores and no E-cores, making it an 8-thread CPU. The mid-range 12th generation Core i5 CPUs are released in multiple SKUs, some with 6P+4E-cores and 16 threads, and some with 6 P-cores and 12 threads only.
The new hybrid CPU topology has important performance implications. The host operating system is responsible for assigning a given task to a certain core. The decision must be made by the OS thread scheduler in real-time depending on the task priority, its foreground or background status, the current load of each of the cores, and many other parameters. For CPUs equipped with both P and E cores, Intel introduced a new technology called Intel Thread Director. This technology is designed to assist the OS thread scheduler with more efficient load distribution between heterogeneous CPU cores. Intel Thread Director requires support in the operating system.
Both Intel and Microsoft announced support for Intel Thread Director (ITD) in Windows 11. While Windows 10 did not receive ITD support, its version of the thread scheduler can still differentiate between P-cores and E-cores. For certain loads, this can be either too little or way too much.
In today’s world of high-performance video cards being used for almost everything including password recovery jobs, using computers’ central processors for breaking passwords seems old-fashioned. Well, it is not. While GPU-assisted recovery delivers 50 to 500 times the performance compared to a CPU alone, GPU-accelerated recovery is not always an option. Some algorithms are designed specifically to deter hardware-assisted attacks. For example, Scrypt, the algorithm used in password-based key derivation functions, was specifically designed to make hardware-assisted attacks unfeasible by requiring large amounts of memory.
A GPU is faster than a CPU because of its ability to perform massively parallel computations. Tens or hundreds of threads can be executed on a video card at the same time. The performance of each of these threads may be lower than single-threaded performance of a central processor, but the sheer number of threads executed on a video card makes the attack much faster than any CPU.
The idea behind Scrypt and similar PBKDF algorithms is requiring a certain amount of memory for calculating each hash value. This is not a problem if you run 8, 16 or even 64 threads on a computer with large enough RAM. However, running 2560 threads on a video card with 8 GB of RAM is a no-go. Scrypt and other algorithms are used in multiple products to increase resistance against massively parallel hardware-assisted attacks. For this reason, getting the fastest CPU may be the only option to break certain types of passwords (such as these) in reasonable time.
How does Windows 10 handles multi-threaded load on hybrid CPUs? Shortly after the official release of 12th-generation CPUs, we tested Elcomsoft Distributed Password Recovery on the new Core i9-12900K and saw some very low numbers. Here’s what we saw:
Before optimization, a new 16-core, 24-thread CPU consuming 241W at full load was noticeably slower than an AMD APU with 65W TDP. It was clear that only the E-cores of the Intel Core i9-12900K were loaded, while the P-cores were idling. The reason for this was lower than normal priority of the threads. In Windows, each process or thread belongs to a certain priority class. By default, the priority class of a process is a normal priority class. Elcomsoft Distributed Password Recovery was designed to run in background, spawning the password recovery jobs and assigning them the below normal priority class. This allowed Distributed Password Recovery to run on computers with homogeneous CPU cores without affecting the use of other foreground apps.
On heterogeneous CPUs, Windows thread scheduler treats our password recovery threads as background tasks and assigns them to E-cores. This happens regardless of the number of threads or the number of available E-cores, and this was the reason for poor performance.
We optimized Elcomsoft Distributed Password Recovery to fully support the new hybrid architecture, and improved some algorithms (e.g. compressed archive encryption) to unlock the full performance potential of Alder Lake. The results are presented below:
All you need to do to set up your password recovery rig for the latest Alder Lake CPUs is updating Elcomsoft Distributed Password Recovery to version 4.4 or newer. Make sure to roll out the update to all the agents as well; it is the agent that controls the CPU cores.
By default, Elcomsoft Distributed Password Recovery agents are configured to utilize all CPU threads. Since this update bumps process and thread priority to ‘normal’ (as opposed to ‘below normal’ in previous builds), running a CPU-only attack can make the computers lag. This happens because all CPU cores will be loaded to 100% utilization. If you need the computer(s) for other tasks, we recommend lowering the maximum number of CPU threads that Elcomsoft Distributed Password Recovery can utilize. This is especially true if you run an agent alongside with the server (on the same computer). If this is the case, adjust the number of CPU threads to (CPU_maximum – 2).
Alder Lake CPUs are today’s golden standard among consumer hardware in sheer performance. Elcomsoft Distributed Password Recovery 4.4 now fully supports the new heterogeneous architecture while bringing other important performance optimizations allowing to achieve the highest performance in CPU-only attacks.
Build high-performance clusters for breaking passwords faster. Elcomsoft Distributed Password Recovery offers zero-overhead scalability and supports GPU acceleration for faster recovery. Serving forensic experts and government agencies, data recovery services and corporations, Elcomsoft Distributed Password Recovery is here to break the most complex passwords and strong encryption keys within realistic timeframes.
Elcomsoft Distributed Password Recovery official web page & downloads »