So this is an improvised talk, so don't blame it to me, just on Hugo.
Don't take the blame.
So anyway, I'm going to present a polymorphic virus I met three years ago.
So I'm going to present the features of this virus.
So it's a polymorphic virus, it's pretty infectious.
It has entry point obscuring techniques, which is basically the virus is not changing the
headers, the address of the entry point in the headers.
And rather than this, it patched the original application to redirect to a few chunks of
obfuscation before actually executing the actual virus entry point.
So it has obfuscated layers using a shitload of junk instructions and different decryption
operations.
And the thing is, back then there was a bug in VMware 5 with one instruction, so the virus
was actually crashing on VMware 5.
So this was a pain in the ass.
So it was using the VRW instruction, and this instruction take a register of the parameter
and obviously this was all random.
So it doesn't crash on the real computer, but it crashed on the VMware.
So tracing through the layers was very boring and slow because they were pretty big and
you don't actually know how many layers the virus created because it's random.
So what I did is, I'm a lazy person, so I tried to find a pattern and most viruses and
parkers or anything doing encryptions usually have a pattern that you can locate and exploit.
So in this one, fortunately it's really simple, it's a CMP and two random registers followed
by a GB, but unfortunately there is fake CMP2 followed by a GB, but those fakes one are
actually jumping very close to their EIP, whereas the real ones are jumping very far
because the layers are pretty big so you can actually use a pattern from this.
So I made an early script to automate the layer tracing so I could make some coffee
while it tries for me.
So this is an example, VRWAX, instruction crashing the virus on the VM.
So the good thing with early script or any scripting language for debugger is you can
control the debugger and automate debugging sessions.
So it's very useful to script and parkers or any tracer for decrypting protections or
viruses or anything like this.
So I wrote a dodgy virus tracer.
What it does is single step through the whole layers only once per layers, you remove the
anti-VM hopcodes and locate the end of the layers using some pattern matching, cheaty
technique.
And so you put a breakpoint after the GB and your resume executions, you only trace once
per layer, otherwise it will take forever.
But the thing is since the anti-VM use random registers, I had to come up with some ways
to identify them because otherwise I would have a lot of patterns.
So I was actually applying a mask to the hopcodes so I could identify any VR with any register.
And so you have to know where you are going to stop the tracing through the layers.
So most viruses use a code followed by a pop to calculate a delta offset.
It's used in shell codes and parkers as well just to know where your data is because you
don't know where the virus is going to be in memory in advance.
So anyway I assumed that eventually the virus is going to get a delta offset so I did some
lame pattern matching but it worked.
So I'm going to do a demonstration now.
So I just first try to do it manually.
So you are inside the code section and so this is the real 3 point of the host application.
There is some junk and instructions injected through inside it.
So you get a lot of redirections until you find the actual start of the virus in the
last section.
So you just try it through it.
Okay on this one this is virtual VMWare 6 so it doesn't crash anymore but it did back
then.
So you see here this is a fake loop to avoid easy pattern matching but it's actually very
easy to identify this is a fake one.
So anyway if you trace through it this is still the first layer.
I didn't even yet start to reach the end of the layer.
So anyway it takes too much time.
So as you can see while it's tracing through it it's noping every anti-VM instruction
for me and so it does all the hard work.
So it's not really generic but it works on every instance of the layers.
So it's only tracing once per layer and you can have 26 layers so it takes a lot of time
to do it manually.
It's a bit slow but it's fine.
I only worked on my slide for 20 minutes it's good to have some slow demo.
And blame it to Yugo.
Usually I do some coffee while it tries for me.
So imagine if you have to do that manually.
Actually so it's fine that under the loop pattern but the breakpoint after it executes
the virus so it stops on the breakpoint and starts tracing again the next layer.
So you only trace the layer once and then you execute the application so it does all
the iteration for you.
So it's a lot faster but still pretty slow.
Hopefully it's done soon.
Alright there we go.
So the virus is actually decrypted in memory now.
We can dump it and analyze it statically with AIDA.
So if you look here you see the call, pop, sub, stuff, the delta offset.
This is what I use for the stopping condition.
So now the virus is decrypted in memory you can start the artwork.
So we've done the process on disk and you can start static analysis with AIDA.
You can also debug your virus without going through the layers anymore.
So now it's a lot faster.
So the virus is using a delta offset which makes static analysis a bit harder.
So in order to read code easily in AIDA what you can do is load the file manually.
There's an option in AIDA you would press manual load and you can actually subtract
the delta offset from the image base and the virus code is going to be a lot easier because
you can rename variables and everything so you don't have to use an IDC script or structure
or whatsoever.
But the rest of the host application is going to be relocated so you cannot analyze it anymore
but we are interested in the virus anyway.
So you just manual load and you enter the image base as an x address minus an x value
for the delta offset.
You calculate it from the call pop sub register.
So now the file is loaded nicely and it's a lot easier to analyze.
Alright next slide, don't use this technique.
I made the screenshot a long time ago so I didn't have enough time during lunch so again
complain to Hugo.
This is the actual code doing the, I don't know if you can see anything but this is the
call pop sub EBP and to calculate the delta offset.
So most viruses use this kind of code.
Then the next thing you actually encounter in the virus is, so I told you the entry point
has been patched and it redirects execution to a few chunks of code with obfuscation in
the polymorphic chunks to avoid easy detection.
So the first thing it does is it patch backs everything inside the host application.
So you have some sort of structure with the remote bytes and the pointer to the original
location and the number of bytes overwrite and as well the number of chunks removed and
every infected binaries have a different number of chunks obviously.
Then the next thing you encounter is a home get proc address kind of function because
the virus has no EAT so it's actually doing some dynamic API resolution.
This is pretty standard code and it's easy to analyze.
So anyway then you have some lame anti-debugging techniques.
It makes an exception to throw a lame reverser off.
Anyway so there is an exception and it generates an exception and then do an accept and keep
resuming the application.
Then there is very lame anti-debugging techniques.
You can see here it's accessed to the PEB.
I don't know why but those guys are doing 100,000 loop iterations to try to detect the
debugger.
It's useless.
It's only reading a byte so maybe it's just to throw off every later because they are
going to stop execution at some point.
Anyway then another one is using melt ice which is soft ice detection.
Also this is useless because the latest version of soft ice is not detectable with this.
Anyway then you have the infection, the actual file infection code.
So what it does is it creates, it opens files then maps them with create file mapping and
map your file and then it does some checks on the file.
First thing it does is it uses a push mz on the stack and pop it back.
So it looks if there is a mz header.
There is not a file that it can infect.
Then it checks for some marker, some random bytes at some useless field in the PEA header,
the mz header actually.
So most viruses use markers to know if the file is already infected or not so they don't
over infect the file.
Then the next thing it does is it gets the address of the offset of the PEA header and
if it's above 200x it just doesn't infect the file so it's probably to avoid some crashes
or something on fancy files.
Then it checks for PE which is the PE marker for PE files and if all those checks pass
it's going to try to infect the application.
So on this slide I don't know if you can see anything but on this slide you can see the
sections manipulations.
So the virus is attending itself at the end of the host program.
Just before that there is a call to a function which is going to remove chunks of code from
the host application and generate polymorphic layers too before actually modifying the last
section.
So what it does is it plays with section alignment so the file is still well aligned even if
the virus is at the end of it.
So it does many things like raw size of the section and virtual size and do many things
to align the file.
Then it puts the infection marker, the one we just saw before, to mark the file as infected.
Okay so this was simple stuff.
So now if you want to analyze a polymorphic engine a few things you have to know before
you start is polymorphic engines use a random number generator because they want to do some
randomness.
And most of virus writers are very lazy so they usually use the same kind of code and
there are some handy instructions to generate a byte or a word or a dword which is a store
instruction.
So a lot of polymorphic engines actually use this kind of thing.
Then you have loops to generate more than one layer.
So an helpful thing to have is the Intel opcode documentation to identify which instruction
is actually assembled by the engine because you only see the bytes of code and you don't
know them by, well I used to but I'm getting old.
So anyway it makes things easier and quicker.
So this is actually the start of the polymorphic generator.
So the first thing it does it gets a number between 0 and 26 which is the number of layers
it's going to generate inside the virus.
If it's 0 then it tries to get a new number.
So the one is the example, the demonstration was only 16 layers so you can have 26 which
is even slower.
Then it called a polymorphic generator function.
So this is a wrapper around the pseudo random number generator function.
So this is just a function to generate a number between 0 and the parameter.
This is the actual pseudo random number generator.
It's easy to identify because usually you have some instructions like rdtsc and then
some fancy calculation and sometimes using a famous constant in pseudo random number
generator.
And then it does many things and starts to actually write the layers and everything.
So in this one you can see the delta offset generation, I don't know if you see anything
but there is a move e8 which is the opcode for a call.
Then it sort eax eax so there is a new dewer and store it after the e8 which is a call
next instruction.
Then it generates a random register for the pop and then it generates a lot of junk instructions.
So every opcode generation is using the same thing, move al the opcode bytes and store
it and do that for every instruction they want to generate in their gene.
So for instance I told you the end of the loop was a cmp reg 3232 and followed by a
gb.
So a cmp with two registers, so opcode is 3b.
You can see move al 3b and then it stores the byte.
Then it does some random, use a random value to get the register for the parameter of the
cmp.
So it's storing the cmp then it will move ax 82.0f which is the opcode for a long gb,
the one used by the loop.
And then it will calculate where it's going to jump eventually.
It depends on the number of jumps it's going to generate.
So again it's coding the generate big trash block function.
Also there is very simple encryption but it's as the virus creates layers it auto-modifies
itself and patch itself to a subtraction, or addition, or anything like this.
So the junk code gene is actually very basic.
It calls, it gets a random number between 0 and 22.
And it uses this as an index in an array of opcodes like this.
So every of these lines is actually an opcode for one byte instruction.
So you pick one randomly and store it and do it again.
This is for one byte chunk but for bigger instructions it just has handlers for any
kind of instruction you want to generate.
So yeah I know it's boring assembly code.
So it just puts the opcode in register and does some magic and assembles it.
This one for the make register because you can do like make register register and everything
is random.
So I guess I'm done.
Do you have any questions?
No?
Thanks.
streamlined the fifth-neck, done, and some setup code that has handlers for this, l