In ESIEA, I am doing research on metamorphic viruses. It is a new area for me, so I have been reading up on lots of new material. I am fascinated at some of the gambits and defenses that are happening in the war between virus writers and antivirus researchers.
In the past week, I have been experimenting with virus construction kits, octave (free version of matlab), and reading reams of papers on computer viruses, hidden Markov models, etc. I feel like I am going in about 12 directions at once. But as my master's thesis adviser once told me, "that's research".
A quick history of viruses...
The classic viruses were fairly easy to detect through a method known as "signature detection". Essentially, virus scanners look for a bit pattern associated with a virus to identify a corrupted file. This method is still the predominant one, but newer viruses are being designed to evade this method.
"Encrypted viruses" attempt to evade scanners by encrypting the body of the virus. Typically, this would be done with a XOR operation, so that the same procedure can be used to both encrypt and decrypt the body of the virus. By itself, this approach is not especially useful -- the virus scanner can still identify the signature of the encryption/decryption code.
"Polymorphic viruses" improve on encrypted viruses by mutating the decrypter function. A simple version of the signature detection approach will then fail totally. Except... Modern scanners will decrypt the virus body, and then scan the virus. (I am still a little fuzzy on how they know when to decrypt the virus body.)
But polymorphic viruses point the way to a far more interesting approach. Rather than relying on encryption, "metamorphic viruses" mutate the body of the virus. This strategy can evade signature detection approaches without relying on encryption. (Interestingly, DRM systems are apparently exploring this technique to defy reverse engineering efforts).
Detecting metamorphic viruses is fairly challenging. Fortunately, most of the metamorphic viruses today have not been particularly good. But some are. NGVCK (Next Generation Virus Construction Kit) was designed (apparently) as a proof of concept. It produces harmless, but hard to detect viruses. (Its last release was in 2002 -- virus scanners might have caught up to it these days).
Current research has been exploring statistical models, especially hidden Markov Models (HMM). The results seem promising, but the battle is not over. Some research suggests that attackers could tune the mutations to emulate benign files. Virus scanners are then left with the unpleasant choice of rejecting benign files or accepting some malicious files (and probably some of both).
Anyway, it is an exciting new realm for me!