A Word About Dictionaries

March 3rd, 2023 by Oleg Afonin
Category: «General»

Dictionary attacks are among the most effective ones because they rely on the human nature. It is human nature to select passwords that are easily memoizable, like their pet names, dates of birth, football teams or whatever. BBC counted 171,146 words in the English dictionary, while a typical native speaker (of any language) knows 15,000 to 20,000 word families (lemmas, or root words and inflections). Whatever the attack speed is, it will not take too much time to check all the English words.

There is a difference between a ‘dictionary’ and a ‘wordlist’. Even simple passwords are not always “words” from the dictionary but may be also common combinations (like “qwerty”) and abbreviations (like “ROFL&SMC”). Several wordlists are included with Elcomsoft Distributed Password Recovery, and of course you can use your own; you can find many collections in the public domain or order specific dictionaries from the manufacturer.

Is there a difference between a common wordlist and an optimized dictionary?

Dictionary optimizations

A dictionary can be optimized to increase the probability of finding a password at the beginning of the attack. One way to optimize a dictionary is using a specific order of word entries. For example, one may place the most commonly used words at the top of the list (e.g. English Word Frequency | Kaggle), while less frequently used words could be placed lower down the list.

Our tools use a different sorting scheme. Dictionaries provided with Elcomsoft Distributed Password Recovery are optimized for our software to deliver the fastest attacks. The dictionaries of natural languages and many specialized dictionaries we provide are sorted by the length of the entries; the shortest words are placed at the top of the list, while the longest entries are at the bottom of the dictionary. We chose this sorting because of the particular implementation of mutations in Elcomsoft Distributed Password Recovery; this may not be optimal for password recovery tools made by other vendors. Dictionaries of leaked passwords, on the other hand, are always sorted by popularity.

If you are composing your own dictionary, you may want to optimize it for your password recovery tasks. We recommend using frequency sorting for dictionaries composed of leaked passwords (e.g. “Top-100”, “Top-10000” and so on). If using dictionaries of natural languages, we only recommend frequency sorting if you are not using mutations. If using mutations, ordering entries by their length is more efficient.

Dictionary types

The dictionaries can be grouped as follows.

Common passwords. These dictionaries contain passwords that are common in certain communities or language groups. The “Top-100”, “Top-10000”, various lists of leaked passwords and similar dictionaries belong to this group. We recommend using these dictionaries for all attacks.

Specific dictionaries. Various argot and slang dictionaries (e.g. the “hacker’s slang”), dictionaries of common names and landmarks belong to this group. We only recommend using these dictionaries if you know that the user could be setting a password belonging to that specific group (e.g. after analyzing their existing passwords).

Dictionaries of natural language. These can be common English words or words that belong to the user’s native language. Various transliteration dictionaries for non-Latin alphabets also belong to this group. These dictionaries can be used for all types of attacks.

The use of mutations

You may or may not want to enable mutations depending on the dictionary used. For example, when using a dictionary that belongs to the “specific” or “natural language” group, mutations do come handy as they account for the common password variations (e.g. appending one or more digits to the end of the password or varying the letter case). On the other hand, dictionaries of common passwords do not benefit from enabling mutations because they already contain the final modifications of dictionary words (of course if the password is based on a certain dictionary).

There are two major types of mutations: generic and specific.

We recommend using generic mutations for all types of attacks as these are commonly used when composing passwords. For example, the most common mutation turns a dictionary word “password” into something like “Password1”, capitalizing the first letter and adding a digit to the end. Generic mutations are especially handy for cold attacks.

Specialized mutations, on the other hand, are in general rarely used. A good example of such mutations is a “l00p” mutation that transforms ordinary dictionary words into “hacker’s slang”. We only recommend using these specific mutation algorithms when you have reasons to believe that the user might have composed their passwords in that specific manner.

Using free dictionaries

If you were wondering if you can use a text dictionary or wordlist obtained from the Internet to set up a password recovery attack, you most certainly can. However, you have to make sure that the dictionary is converted to a supported format (see below). If you are going to use mutations, we recommend making all entries lower-case; this does not apply to dictionaries of common passwords. You may further improve performance by optimizing the dictionary for the particular password recovery tool. The following settings are applicable to Elcomsoft Distributed Password Recovery.

For common and specialized dictionaries:

  • Save the dictionary as a “.udic” text file. The file must be using the Unicode LE (little-endian) encoding.
  • Deduplicate.
  • Make all entries lower-case.
  • Sort entries by their length. The shortest words should be placed on the top of the list, while longer words should be at the bottom.

If you are using a password comprised of existing passwords, use a different workflow.

  • Save the dictionary as a “.udic” text file. The file must be using the Unicode LE (little-endian) encoding.
  • Do not change the case of the entries.
  • Sort entries by popularity. Entries that show up more frequently should be placed on the top of the list, with less popular entries at the bottom.
  • At this point, deduplicate.

Making a custom dictionary

You can create custom dictionaries for your circumstances. For example, a custom dictionary may contain a list of phone numbers, dates, document numbers, names of family members or pets of a given user, and any other personal information that might have been used as part of a password.

We recommend treating such passwords depending on whether or not you are planning to use mutations. With no mutations, the order of entries does not affect the recovery speed; you may want to place the more common passwords closer to the top of the list.

If, however, you are making a password based on natural words, you may need to use mutations to produce realistic passwords. In this case, we recommend lower-case conversion and sorting (see previous chapter).

If possible, try keeping the size of the dictionary within limits. About 10,000 entries make a good dictionary that can be realistically used on most data formats together with the most common mutations. A significantly larger dictionary will likely choke the attack if you use any but the most basic mutations.

Finally, Elcomsoft Distributed Password Recovery supports attacks that combine entries from two dictionaries. The two dictionaries can be different, or they could be the same dictionary; in the latter case the passwords will be produced as two-word combinations from that dictionary. Please note that if you use two dictionaries the number of passwords to try will be the multiple of the two.

Specifications

Elcomsoft Distributed Password Recovery supports dictionaries that conform to the following specifications.

  • Format: text file, with one entry per line.
  • Encoding: Unicode LE (little-endian). Do not use ANSI or UTF-8 dictionaries as they are not supported.
  • Case: lower-case entries if you plan to use mutations; original case if you don’t.

Conclusion

Dictionary attacks are among the most effective ways to crack passwords due to human nature. Most users tend to use easily memorable passwords that are based on words that can be found in  a dictionary, making these attacks faster and more likely to succeed. There are different types of dictionaries, and choosing the right one and optimizing it correctly can cut the time of an attack severely while improving the success rate. Mutations can also be used to further increase the chances of finding a password while increasing the time required to run the attack. While free dictionaries from the internet can be used, they must be converted to a supported format and optimized for the specific password recovery tool being used. By understanding the various types of dictionaries and mutations, one can better plan and execute successful password recovery attacks.


REFERENCES:

Elcomsoft Distributed Password Recovery

Build high-performance clusters for breaking passwords faster. Elcomsoft Distributed Password Recovery offers zero-overhead scalability and supports GPU acceleration for faster recovery. Serving forensic experts and government agencies, data recovery services and corporations, Elcomsoft Distributed Password Recovery is here to break the most complex passwords and strong encryption keys within realistic timeframes.

Elcomsoft Distributed Password Recovery official web page & downloads »