Posted on Saturday, 20th February 2010 by Michael
Detecting Malware and other malicious files using md5 hashes
The initial interest for this research came to me after reading an article on this on the site http://enclavesecurity.com/ . In the article they talk about using the malicious hashes to discover malware and other malicious files on their systems. They also take a deeper look into the recent APT and Auroa attacks on Google. Though the thing I found most interesting is trying to develop a way to automate this process for free and provide usable information.
The biggest thing to understand before continuing on is that this is not a fool proof process as a simple change of the file will change the hash of the file. For example if you have the c99.php shell and change the password or add a white space to the php this will change the hash of the file hence making detection via this method impossible. The other issue I have noticed in using this methodology is no one is willing to share all the information. Many companies will only share bits and pieces such as “The Malware Hash Registry” (http://www.team-cymru.org) considered the leading authority on this topic. They make part of their service available online to submit hashes to and get back the following information:
Ex:1: 7697561ccbbdd1661c25c86762117613 1258054790 NO_DATA
Ex:2: cbed16069043a0bf3c92fff9a99cccdc 1231802137 69
In example 1 you see the md5 hash then the epoch date and time then NO_Data meaning it could not tell if this hash is malicious. In example 2 you see the same except instead of NO_data you see 69. This number means that 69% of the Antivirus vendors they used to check this file with found it to be malicious. This info is good but I find it to be not very helpful. It is nice to know that it was detected as malicious but is it truly malicious and if it is what type of malicious file is it, is it a backdoor, key logger or so on. I have emailed them asking if they could provide the detection type; with understanding that most of their system is private as they will not disclose the database or the vendors they use to scan the files. Though I have not heard back from them at this point.
This led me to searching the internet for other sites like this that provided additional information along with the hash. In this search I found one other site called http://malwarehash.com a sub site of the company NoVirusThanks.org. They provide an online utility to submit your hash to and if it is discovered as malicious it will give you info back. See screen shot below:
As you can see they provide an additional layer over what you get from the Malware Hash Registry. On top of that they use a simple PHP script for the query that makes scripting this so much easier:
http://www.malwarehash.com/result.php?hash=1E71DE2D6A89AA9796344BB7FA23AC7E
As you can see in the URL you have the site the script and the hash. The only issue with this site is that it seems they have not updated their database since 6/2009. I have contacted them as well to ask them about this and to see what their plans are for the site though I have not heard back from them either.
With this information in hand I set forth to develop a script that would allow me to automate this process as we have found this methodology to be helpful at work even if it is not 100% accurate as we notice that most malware will not get detected by our Anti virus so by using the hashes and relying on the internet community we are able to help our detection and remediation of malicious files.
To use this script you will need to have a Linux user account and some basic knowledge of Linux to set the variables properly. I wrote the script in bash for two reasons 1 it is a piece of cake to do and 2 so you be forced to move the malicious file off a windows environment where you stand a higher chance of infecting your self. First access your shell and create a directory called what ever you want but in the code we used a directory called infect that is set in a variable for easy changing. Once you do that copy the malware-hash.sh script to 1 directory above the folder you just created. Then copy the sed script file to a file called clean in the directory that you created. Once you have done this chmod the malware-hash.sh script so you can execute it and chmod the clean script so the malware-hash.sh script can read it. Once done all you have to do now is copy the suspicious files to the directory you created and execute the script. The script will get a listing of all the files in that folder, remove the clean script, and any dupes from the listing and then get the md5 hash of each file. Once it gets the hashes it will create a batch file to be processed against The Malware Hash Registry and save the results in a clean human readable format. We use the batch function to stay with in the TOS of the site. This includes adding the file names in front of the hash so you know what the hash belongs to. Next it will take the hashes and run them through the site Malwarehash.com. We use the --random-wait command with wget here to not act like a bot or script. If it gets a hit for a infection we will grab the site and scrape out the data we want then process it into a human readable report. Once all done we will combine the results of both checks and email the final results to the email address provided.
The script is written in bash and is highly documented:
The script is broken down into 2 sections the actual script and the sed script file.
Part 1 the Script: Copy this script to a file with a .sh extension or download it here http://www.digitaloffensive.com/malware-hash.sh . I suggest downloading it as the word press system will definitely destroy the formatting of the code. Place this script 1 directory up from the directory that you are using for the infected files.
#!/bin/bash
################################################
## MALWARE HASH BASH ##
## Written by Michael LaSalvia ##
## http://www.digitaloffensive.com ##
## Inspired by an article at enclave Security ##
################################################
#Variables and clean up
#Edit in Path to dir that contains file for analysis
inPath=/home/mike/virus/infect
#Path to your md5sum app to verify it is not compromised. I got the hash from a new install on fedora 12.
wmd5sum=/usr/bin/md5sum
md5sum /usr/bin/md5sum > .tmp
mverify=`cut -f 1 -d ' ' .tmp`
if [$mverify == 019329f334fa7ef6116ad1a24271c8da ] then
echo "Your md5 hash matches"
else
echo " Your md5sum hash is not right, Please verify it before continuing. Press CTRL+C now to exit"
fi
rm -Rf .tmp
# I strongly urge you to make sure your md5 application is not compromised or the rest of this script is useless.
Sleep 20
#Get a list of file to analyze and get their hash
ls $inPath > files.txt
for vfiles in $(cat files.txt)
do
cd $inPath
md5sum $vfiles >> hashes
sort hashes | uniq > $inPath/hashes.txt
done
#Clean up my files
cat $inPath/hashes.txt | grep -v hashes >> .tmp; mv .tmp $inPath/hashes.txt
cat $inPath/hashes.txt | grep -v md5 >> .tmp; mv .tmp $inPath/hashes.txt
cat $inPath/hashes.txt | grep -v clean >> .tmp; mv .tmp $inPath/hashes.txt
#Format file to submit to http://www.team-cymru.org as a batch
cut -f 1 -d ' ' $inPath/hashes.txt >> $inPath/md5hash.txt
cut -f 3 -d ' ' $inPath/hashes.txt >> $inPath/md5name.txt
echo "begin"| cat - $inPath/md5hash.txt > .tmp && mv .tmp $inPath/md5hash.txt
echo end >> $inPath/md5hash.txt
rm -Rf $inPath/hashes.txt
#Send batch request o the Malware Hash Registry (I Love netcat)
nc hash.cymru.com 43 < $inPath/md5hash.txt > $inPath/md5results.txt
#Clean up response and format it
cat $inPath/md5results.txt | grep -v "#" >> .bk; mv .bk $inPath/md5results.txt
paste $inPath/md5name.txt $inPath/md5results.txt > $inPath/results.txt
#cat $inPath/results.txt
cat $inPath/md5hash.txt | grep -v "begin" >> .tmp; mv .tmp $inPath/md5hash.txt
cat $inPath/md5hash.txt | grep -v "end" >> .tmp; mv .tmp $inPath/md5hash.txt
#Dirty web scraper and formating (site may be out of date)
for whashes in $(cat $inPath/md5hash.txt)
do
wget --random-wait http://www.malwarehash.com/result.php?hash=$whashes -O $whashes
if grep "INFECTED" $whashes > /dev/null; then
cat $whashes | grep -m 1 a-squared >> $inPath/.tmp
cat $whashes | grep -m 1 "Avira AntiVir" >> $inPath/.tmp
cat $whashes | grep -m 1 "Avast<" >> $inPath/.tmp
cat $whashes | grep -m 1 AVG >> $inPath/.tmp
cat $whashes | grep -m 1 BitDefender >> $inPath/.tmp
cat $whashes | grep -m 1 ClamAV >> $inPath/.tmp
cat $whashes | grep -m 1 Comodo >> $inPath/.tmp
cat $whashes | grep -m 1 "Dr.Web" >> $inPath/.tmp
cat $whashes | grep -m 1 Ewido >> $inPath/.tmp
cat $whashes | grep -m 1 F-PROT >> $inPath/.tmp
cat $whashes | grep -m 1 "G DATA" >> $inPath/.tmp
cat $whashes | grep -m 1 IkarusT3 >> $inPath/.tmp
cat $whashes | grep -m 1 Kaspersky >> $inPath/.tmp
cat $whashes | grep -m 1 McAfee >> $inPath/.tmp
cat $whashes | grep -m 1 "Malware Hash Registry" >> $inPath/.tmp
cat $whashes | grep -m 1 NOD32 >> $inPath/.tmp
cat $whashes | grep -m 1 Norman >> $inPath/.tmp
cat $whashes | grep -m 1 Panda >> $inPath/.tmp
cat $whashes | grep -m 1 "QuickHeal" >> $inPath/.tmp
cat $whashes | grep -m 1 "Solo Antivirus" >> $inPath/.tmp
cat $whashes | grep -m 1 Sophos >> $inPath/.tmp
cat $whashes | grep -m 1 TrendMicro >> $inPath/.tmp
cat $whashes | grep -m 1 VBA32 >> $inPath/.tmp
cat $whashes | grep -m 1 "VirusBuster" >> $inPath/.tmp
#More Cleaning and report creation.
sed -f $inPath/clean $inPath/.tmp > $inPath/.tmp1; mv $inPath/.tmp1 $inPath/$whashes
rm -Rf .tmp .tmp1
echo "Results from MalwareHash.com" >> $inPath/final_report.txt
echo " ------------------------------------------------------" >> $inPath/final_report.txt
echo "$whashes : " >> $inPath/final_report.txt
echo " ------------------------------------------------------" >> $inPath/final_report.txt
cat $inPath/$whashes >> $inPath/final_report.txt
echo " ------------------------------------------------------" >> $inPath/final_report.txt
else
echo "Results from MalwareHash.com" >> $inPath/final_report.txt
echo "NO RESULTS FOUND for: $whashes" >> $inPath/final_report.txt
echo " ------------------------------------------------------" >> $inPath/final_report.txt
fi
rm -Rf $inPath/$whashes
rm -Rf $inPath/md5*
rm -Rf $inPath/hashes
done
cat $inPath/results.txt | cat - $inPath/final_report.txt > .tmp && mv .tmp $inPath/final_report.txt
echo "Results from The Malware Hash Registry" | cat - $inPath/final_report.txt > .tmp && mv .tmp $inPath/final_report.txt
mail -s"Malware" me@me.com < final_report.txt
Part 2 the sed script:
Copy this code and put it in a file called clean located in the folder that has the files you want to analyze and chmod it so the script can read it.
s/<tr><th>/AV Name:/
s/<tr><th width="150">/AV Name:/
s/<\/th><td width="83">/ Sig Version:/
s/<\/td><td width="100">/ Engine Version:/
s/<\/td><td width="116">/ Engine Version:/
s/<\/th> <td width="83">/ Sig Version:/
s/<\/td> <td width="116">/ Engine Version:/
s/<\/t<td width="213"><font color="#336600" size="3">-/ Virus Name: Nothing Found/
s/<\/t<td width="213"><font color="#336600" size="3">-/ Virus Name: Nothing Found/
s/<\/td><td width="213"> <font color="#336600" size="3">-/ Virus Name: Nothing Found/
s/<\/td><td width="213"> <font color="#CC0000" size="2">/ Virus Name: /
s/<\/td><td width="213"><font color="#CC0000" size="2">/ Virus Name: /
s/<\/td><td width="190"> <font color="#CC0000" size="2">/ Virus Name: /
s/<\/td> <td width="213"> <font color="#336600" size="3">-/ Virus Name: Nothing Found/
s/<\/td> <td width="213"> <font color="#CC0000" size="2">/ Virus Name: /
s/<\/font><\/td><//
s/\/tr>//
s/<\/font><\/t//
s/<\/font> <//
s/<\/font><\/td> <\/tr//
s/> <\/tr//
s/d>//
Though this methodology is a few years old there is many things that can be done with this. For example we are in the process of writing a tripwire type script that will allow web masters to monitor changes to their sites and to be able to quickly see what was added or modified as well as run it though the process above to search for infections / compromise
As always if you have any questions, comments or concerns please feel free to contact me.
Posted in Code | Comments (4)
February 21st, 2010 at 11:20 am
Nice work. I wrote a hash submission script like this but I used virustotal.com. Great work!
February 22nd, 2010 at 7:46 am
I got a email back from Roberto at http://www.malwarehash.com currently the project is on hold though they are looking at releasing it again with new features.
September 29th, 2011 at 4:06 pm
We have released a new malwarehash.com service, we now offer an API to query our database of malicious hashes, service can be tested from this page: http://api.malwarehash.com/public/
September 30th, 2011 at 4:30 pm
Roberto,
Thanks for the update I will check this out.