Wednesday, January 22, 2014

An iFrame HTML Obfuscation Flavor

I was given an HTML file to look at the other day. It was believed to be redirecting clients to a malicious domain. The file was moved into my Linux based sandbox for analysis and I noticed that two of my favorite tools of choice to start analysis with (grep and less) were not able to display text within this HTML file (first picture below). The next logical step was to check the file type (second picture below):


Notice that the file command claims that this is a UTF-16 encoded file. After some quick research, it turns out that the grep (standard build on a Ubuntu install) utility does not support UTF-16 encoded data.
Below is a utility (iconv) that can be used to convert the file from UTF-16 to UTF-8. Keep in mind that grep can, by default, read UTF-8 encoded data.


Now you can simply view the file with most linux command line tools that support UTF-8 data encoding. Just in case you were wondering a little about UTF-8... UTF-8 is backwards compatible with ASCII, which means that nearly all text editors support UTF-8 encoding.
Now that I can easily view the file with my tools of choice, I plan to search for any redirect functionality, such as meta refresh, server redirect, JavaScript, iframes, etc. to identify how users were navigating to another domain from this page. There are a couple of iframes in the HTML code and one of them stood out. Here's a snip it of this code:
Do you see any type of encoding in the screen-shot (hint: look for patterns)? There is decimal encoded HTML as part of the contents of that iframe. This encoding may be to deter simple static string analysis, among other things. The below picture shows what the characters after the "src=" are after converting them to ASCII with a python script that I wrote.
Below is a screen-shot of the decoded iframe:
This specific example is fairly unique as it used decimal HTML encoding and once the decimal encoded characters are decoded to ASCII text, the text that displays is a "bit.ly" address (yet another obfuscation layer). Once you navigated to that "bit.ly" address and made it to the final domain, you most likely will have been served something malicious. I threw that domain into VirusTotal and at the time of my analysis, the detection ratio 3 of 51 indicated that this domain could very well be malicious.
Below is a quick regular expression that could be used to identify iframes that contain decimal encoded data. Please note that there are *many* obfuscation possibilities that could bypass this regular expression. My intent was to not provide an exhaustive detection signature, but instead show you a simple yet unique way of bypassing detection mechanism's in the enterprise, and then detecting it.
iframe src="(&#\d{2,3}){3,}