In this lab, we will cover the basics of password cracking, and hopefully convince you to change all of your passwords. For this lab you'll need to use a Kali Linux VM. Grab the latest release here. You can just use it as a Live CD (a common use case with Kali), no need to setup a virtual harddisk (but be careful if you want to save your work).
If one were to build a multi-user systems that requires logins, a first stab might be to store passwords for users in a file (in the clear) and perform a simple comparison operation when the user types in their password. This has the advantage of being very fast. However, for this system to be secure, the password file must always be protected. The assumption is that only a root user can read or modify the password file. However, what if the system administrator becomes a disgruntled employee and decides to trash other employees' passwords? What if the sysadmin is compromised or hacked? Now everyone's password will be disclosed. What we need is a way to store passwords such that (1) we can still authenticate users and (2) even if the stored passwords are disclosed, the password itself is not revealed.
This is where hashing (one-way functions) come in. The basic idea here is that we don't store a password at all. Instead, we hash the password using a strong, cryptographic hash function, e.g. SHA-2 or MD6 (see SEED Ch. 22), and store the hash value in our system or database. When a user attempts a login, the password they type is hashed using the same algorithm, and if the output matches what we stored, they are successfully authenticated. The security of this scheme is vastly better than before, but as you've probably already guessed, its strength relies on the assumption that given a hash value, an attacker cannot derive the password that produced that value. This is equivalent to saying that the hash function is truly a one-way function. If this isn't true, our security is no better than before. Luckily, we have plenty of hash functions that meet this requirement (another requirement is that our hash function is very unlikely to generate collisions, which is a harder requirement to meet. If there are collisions, that means that two or more passwords could produce the same hash).
Hash functions may make it seem as though there's no hope for an attacker. Of course, a hacker can just guess passwords until he's right. This is a brute force approach. Imagine an organization that implements a password policy that requires users to create passwords that are between 4 and 8 alphabetic characters in length. No numbers or symbols. This turns out to be about 217 billion possibilities (you should do the math on your own). That's peanuts for today's systems. So an attacker can just write a program to guess every possibility, and will likely find the correct answer in a few seconds.
Thankfully, this is easy to defeat. We can simply make our system either (1) limit the number of login attempts and/or (2) rate limit the attempts, for example allowing only one login attempt per second. This defeats the above attack, because to try the same number of combinations would take almost 7,000 years.
However, what if the attacker knows the hash. For example,
if an attacker can perform a privilege escalation attack on a Linux
system, he or she can read /etc/shadow
, which stores
these hashes. Then the attacker is back in business, because
the same brute forcing techniques can work. However, reasonable
password lengths and using numbers and symbols will make brute force
attacks take a much longer time.
On your Kali VM, create a new user with a short password by running the following:
$ useradd foo
$ passwd foo
... type in 'foo' to the prompts ...
This creates a user named foo
with the password foo
.
Take a look at how Linux stores the password entry:
$ grep foo /etc/shadow
foo:$6$kpm6CJMA$90AsSsKGeBdGHN3MVzHVW2SslM2k6YoOQ7g6SHYY5JXKQG9LX2uXdPPu8pznvtKMX1Uob1E.Z2Gn7G/dzQRjI1:18312:0:99999:7:::
Normally you'd have to do this with sudo
, but the default user on Kali is root. Notice that the output is delimited by both ':' and '$' characters. The ':' characters
delimit entries in the password file, so the first entry is the username, the second is the hashed password,
and the rest are things like timestamps, expiration dates, etc. The password itself is generated
by the crypt()
program. This program uses '$' to delimit attributes of the hashes it generates.
The first attribute is the type of hash algorithm (see here).
6
means SHA-512. The second attribute (in this case kpm6CJMA
) is a salt, which
we'll come back to later. The last component is the hash value itself, which is stored in a base64-like
encoding.
Since we know the password, we can recreate the entry above by using Python's API to crypt
. Try running
the following:
$ python3 -c 'import crypt; print(crypt.crypt("foo", "$6$kpm6CJMA$"))'
The first argument is the password, and the second is a salt. You should get the same result as what appears in /etc/shadow
. Of course,
in this case we knew the password, so it was easy.
Even if we hash passwords, there's a potential here for leaking information. What if two users choose the same password? This is actually pretty likely (1) because we aren't that original and (2) many systems have millions of users, increasing the likelyhood that there are duplicates. This means if we can somehow crack one person's password, we get a free lunch for another. To prevent this, we use salts. These are essentially random numbers appended to a password before the hash is computed, ensuring that duplicate passwords still produce distinct hashes. A salt is conceptually very similar to a nonce, which you'll recognize from the encryption lab. To authenticate users with salts, we store the salt in the clear along with the hash, i.e. if the password is P, the salt S, and the hash function H, we store S || H(S || P). Now that we know about salts, we can proceed to the next task.
Write a Python program to try every three-letter possibility to crack foo
's password.
Your program should take as its argument a user name, and it should open /etc/shadow
,
search for the username, and brute force its password (using 3-letter possibilities). Call your
program crack.py
. Report how long it takes to crack the password by using the following:
$ time ./crack.py foo
How long does it take if you include numerals in addition to letters?
Of course, we can reduce the size of our search space dramatically by using what we know about human nature, that is that people are pretty predictable and have limited memories. We can thus guide our search by using a dictionary, a word list that contains things like common words, common passwords, etc. We can then try out our dictionary words first, and only if this fails fall back to an exhaustive brute-force search.
To strengthen passwords, many users start with dictionary words,
but augment them with symbols and numbers. For example,
I might start with the password seedlabs
, but
augment it with numbers to produce s33dl4bs
. This
is not a dictionary word, but its easy enough to write programs
that take dictionary words and produce hybrid combinations
like this, essentially mutating our wordlist in pre-defined ways.
Let's do the same exercise, but using a mature tool, called John the Ripper, which employs a hierarchical approach like we outlined above. John comes pre-installed in your Kali image. You can see how to use it by running it without any arguments:
$ john
Running this is easy as giving it the file we want and letting it go to work:
$ time john -users:foo /etc/shadow
How did the time compare to your method?
Even with the dictionary and hybrid attacks above, it's still a computationally intensive approach. While we can speed things up with GPUs and custom cracking ASICs, we can also observe that a good deal of the time in cracking is spent computing the hashes. We can cut down on this time by pre-computing a table of hashes for our dictionaries and then instead perform straight comparisons. These tables are called rainbow tables. They trade off memory for performance, as these tables can grow quite large (often many GB).
If you're interested in exploring rainbow tables, check out RainbowCrack.
Now let's run some experiments with dictionary attacks.
We're going to use HashCat for this
exercise. This
is another program for password cracking that has a nice language for
generating mutation rules that can be applied to transform words in
the dictionary. HashCat also comes pre-installed on Kali. You can
run hashcat -h
to see the help screen.
Let's imagine a scenario where we've compromised the hashed passwords for a criminal organization in Chicago. We know that this criminal organization is involved in betting for Chicago sports teams. We also know from lurking on an IRC that its members frequent that passwords in the organizations internal systems have to be at least 8 characters long and must include at least one number.
Let's challenge our tools a bit by creating a new user:
$ useradd bar
$ passwd bar
This time, make the password whit3s0cks1994
.
Now, on the face of it, this is a pretty strong password.
It's 14 characters long, and includes numerals. A naive
brute-force search faces 36^14 possibilities, which is
pretty daunting. Even with john
, it's going
to take a while. Give it a try and go for a coffee:
$ john -users:bar /etc/shadow
Still waiting? You can see why password length (and mixed symbols) really does matter. However, our inside knowledge can help us to reduce the search space considerably. With this knowledge, we have all we need to perform a dictionary attack. We know that this organization probably has some serious sports fans, so why don't we include the Chicago-area sports teams in a word list? We'll also throw in some easy ones as well.
$ for i in bears cubs whitesocks fire blackhawks abcd qwert password; do echo $i >> wordlist; done
Let's see if we can use this wordlist with hashcat. We first need to tell hashcat a hash type to apply. From the man page, we can see that we need type 1800 (for sha512_crypt, which is what is used for /etc/shadow).
$ grep bar /etc/shadow > crackme # we don't care about the other users
$ hashcat --force -m 1800 crackme wordlist
This finishes quickly, but it fails saying "Exhausted," because nothing in the wordlist is in the file. The user has added his birthdate to the end and has applied a letter-number substitution scheme. Let's use HashCat's rule system to specify some transformations to apply to the password candidates in the wordlist before it hashes them.
So what do we know? We know that sports teams are proably a good bet. We know that users have to have numbers in their passwords. What are some easy numbers? password1, password2, etc. are always good bets. Also 1234 appended to the end. We'll also try caps, lowercase, We can encode those like so in a hashcat rules file:
$ cat > rules
:
u
l
c
$1
$1 $2 $3
$1 $2 $3 $4
(hit ctrl^d)
Each line here represents a transformation rule, and each rule will be applied to every word in our wordlist before hashing. Thus if we have N words in our dictionary and M rules, we will attempt NM password hashes. The first rule (:) means "apply no transformation," meaning the unchanged word. The second rule (u) means make the whole word upper-case. The third (l) means make the whole word lower-case. The fourth rule (c) means capitalize the first letter, and make the rest lower-case. The next three rules use the append ($) operator. Here, $1 means "append a 1 to the end of the word," and $1 $2 means "append 12 to the end of the word." We can now try this with hashcat:
$ hashcat --force -m 1800 crackme wordlist -r rules
Still no beans. We need to consider some other simple numbers. A really common one is to append your birth year to the end of a password. This is easy enough to encode. For example, if someone was born on 1994:
$1 $9 $9 $4
So really, all we need to do here is write a script that can create a rule for each birthday for anyone. There's another nice transformation rule we can use, in the form of sXY, where X is a pattern to search for and Y is its replacement. We can use this to guess easy substitutions. For example:
sa4 se3 sl1 so0
This will replace the letter 'a' with '4', and so on. People familiar with l33tsp34k will often use this in passwords. Let's go ahead and add this to our rules:
$ echo "sa4 se3 sl1 so0" >> rules
OK, back to birthdays. We need to autogen rules for every year (say since 1920 and 2020), with and without number substitution:
$ for i in $(seq 1920 2020); do echo $i | sed 's/\([0-9]\)/$\1/g' >> rules; done;
That gives us the birth year suffixed to the word. Now with substitution as well:
$ for i in $(seq 1920 2020); do echo "$(echo $i | sed 's/\([0-9]\)/$\1/g')" "sa4 se3 so0 sl1" >> rules; done;
Now let's give it a try again:
$ hashcat --force -m 1800 /etc/shadow wordlist -r rules
Bingo! You should see the password appear in hashcat's output now. Keep in mind this is running on a VM. It would go a bit faster on physical hardware, especially with more cores. HashCat really prefers to use a GPU cluster, an FPGA, or some other OpenCL accelerator to make the hashing blazingly fast.
I've put a salted UNIX password here. Can you crack it? Hint: can you find anything out about me from my website?
Here are some insights you should probably take away from this lab:
Please write your lab report according to the description. Please also list the important code snippets followed by your explanation. You will not receive credit if you simply attach code without any explanation. Upload your answers as a PDF to blackboard. You must turn this in by the Tuesday after Spring break, Tues. 3/24 @ 11:59PM.