I'm Gera.
I'm going to present you two very small tools that I did and find very useful.
So maybe some of you is going to find them useful too.
They're very small and I'm almost ashamed of presenting them to you.
But there's one guy who already said that he might be using one of them, so that's enough
for me.
The first one is iterative decompilation framework.
That's probably too big a name for 80 lines of C code.
But it kind of works, so let's see.
Why I did it, I have this kind of personal project which is turning back to C, some dead
code that I found and I'm using it.
I want it back to C. So decompilation, I don't know if you ever did it, but it's a very tedious
work.
There are tools to help if you had a chance to test hex rays.
It helps a lot, but it's not the final solution.
Like hex ray is meant to help you understand the code, but not decompile it.
The main problem is when you are decompiling something, it's very hard to test what you
are doing, because until you have a good number of lines and self-contained code, it's very
hard to run it and debug it.
I like a lot iterative approaches where you do small stuff and you test it and you go
again and do more.
So I thought I'm doing this.
How does it work?
It's very simple, as I say.
It can be complicated in 80 lines of code.
The idea is they compile function by function and I want to test each function or maybe
smaller versions of the function.
So the answer is there.
There is no secret.
I compile C to a DLL.
This is Windows-based.
The same can be done on Linux.
Actually, there is a guy, raise your hand if you want, who did it on Linux using FD preload
and I had also done it before.
So I know it's possible to do it.
On Windows, it's a DLL.
Kind of the main startup code of the DLL, I have all the hooks that I want to do.
It's kind of a list of the functions I want to replace.
Replace this address with this function.
And the startup code goes through all the addresses, maps the memory, writable, and
just overwrites the first bytes of the function with the sham to my C code.
You understand this is very simple.
It's still, I insist, it's very useful.
So let's see an example of use.
I have only 30 minutes and I want to show you how it works for me.
So I don't know if you know, but notepad has one feature.
And that's if the first line of the file, if the file starts with.log, every time you
open the file, notepad will add a timestamp at the end of the file so you can do a log
file.
So it's like very interesting.
So I tried to find a very stupid thing to show you how the tool works.
So this is it.
Okay, so oh, now I'm doing the talk.
So it's all fine.
So I save it and if I open it again, you'll see a new timestamp below that, right?
Okay, there you go.
So it's kind of working.
Nice.
I won't save that one.
So I thought, okay, let's see how that is implemented in notepad.
This is the code for notepad.
So that specific function, you can see it gets the time, has a string cut.
I'm not looking for security bugs, okay?
Formats the time, whatever.
So this is the function.
And I wanted to, let's say I want to reverse notepad, which I don't, of course.
Nobody does.
So I'm like in this particular time of the process of reversing notepad or one specific
part that I'm interested in, I'm reversing this function.
So I write, this is a simpler version.
It's obvious it's simple because you can see, oops, this one has at least three ifs, right?
My version does not.
So it's a simpler version.
But it does just the same.
It gets the time, formats it, and the string cuts it, okay?
If I execute notepad now from the side, we should see.
We have notepad, we open the file, and the break point is hit, right?
Because the original notepad is kind of relinked with my new function in place.
So now I can debug the code I'm reversing.
Oh, it's so fine.
Good.
And if I want, I can continue reversing on assembly, right?
I don't have the source code for notepad.
I don't know if you do, but I don't.
So it continues on the assembly side.
When you are doing it, yeah, don't look at me.
I know it's simple.
It's like, this is how you do it.
You define the new version, and at the end you say, replace this address with that function.
That's pretty much it.
If you want to do another one, you just cut and paste.
That's how I do it.
You might be thinking, okay, what happened is this function that I'm replacing needs
to use global data or call the function already existing on the binary.
Well, that's possible.
I'm using defines to do that.
Just ignore that for a second.
This is just defining a pointer on a fixed address.
In this case, I'm actually using a global variable.
In your source code, it just looks like it's a variable.
In the original function, you can see the function here finishes with a call to the
function, another function.
Does anybody know what that is?
That's a cookie check.
This function has buffers and the recompiled notepad on your Visual Studio, so it has a
cookie check.
I was not paying much attention, so I said, okay, I need to add that function call because
I'm just doing it blindly.
I here, up here, I declared the function.
This function returns a void and does not take any argument and it's a dot constant
address fixed.
I just did that, but of course, the cookie check failed and notepad broke, so I then
took it out.
Now, if you want to improve notepad, who did it nicer?
You can, for example, I like this frag formatting thing, so I recompiled it.
Actually, it's just running.
I don't think I need to show it.
I mean, it's obvious, right?
Oh, it's running there.
That's why it's not recompiled.
Actually, let's take the chance to show you that dynamic recomp, I don't know how you
call this, maybe dynamic recompilation or whatever.
This dynamic thing Visual Studio has that recompiles everything, so I don't even need
to exit the program and my modifications show up there, of course.
I mean, no secret.
I could, for example, add new functionality here.
We'll exec cog five.
Oops, again, let's set a break on there.
So let's open the file again.
And if we continue now, oops, it recompiled.
What?
Oh, thank you.
So you're actually paying attention.
Okay, so we got a calculator plus notepad now.
That's super cool.
But there is something I was not expecting and I thought, yeah, we could do it.
And if I just run the original, okay, there is something I didn't say there in the slide,
I think.
I add a new import to notepad in this case.
So it just loads the DLL.
Then the main code on the DLL takes care of everything.
So suppose I want to, a little bit like the talk we did, we saw yesterday about Sue and
Sue doing, like, Dorian, Lipsy, like, runtime instrumentation.
So suppose I just suppose it, like, I hack into a box and there's this guy running all
these applications and I want to inject a backdoor or something on the process that
is running.
This also works for that.
So you just need the PID of notepad 1352 and use load DLL 1352.
First I show you this version is not as good.
It does not have cock, right?
So no cock.
I inject the DLL and now I open the file again.
Oops.
And it didn't work.
Of course it's a demo, right?
What would you expect?
Did it save?
Oh, there it goes.
Oh, no, it's, if I reopen it without closing it, it should be there.
I'm just recompiling because we did the dynamic recompilation thing and I'm not sure it works
with that.
That might be it.
If it doesn't work, we go to the next one.
So we need not 360.
That's a better number.
Actually that might be it.
That was it, actually.
So first to show you it doesn't work.
And now we shake the DLL.
I mean, you already know what's going to happen, right?
It's like why I'm showing it.
We got cock.
Okay.
So yeah.
So now we can do all the math we want and cheat on the exam.
So let's go for the next thing.
The next thing is Python disassembling in Shine.
Again, probably too big a name for one file of Python, but it's still useful.
Like I do things to use them, so I try to keep them simple.
If I then need more, I add features.
Why I did this?
Truly because I wanted to test an idea after Petron presented PyMA last year here, two
years ago here, I thought like there is something I want to try on in disassembler.
And I tried to do it on IDA and I couldn't do it.
I'll talk about that later.
So I thought, okay, we all want to rebuild IDA at some point of our life.
We want to like, yeah, don't lie to me.
I mean, you know you want.
So I started doing it.
It's the second time I do it.
So I kind of had some ideas and it didn't take too long because I'm using pgraph as
the graph backend.
It's not like a big library for graphs, but I want it to be integrated with PyMA.
So pgraph is the graph library PyMA uses.
And I thought, okay, if I do it on top of that, I get free PyMA integration.
And it actually worked.
So I'm not going to write it into this disassembler.
If you try it, you know how hard that is.
So I just found this great library, wrapped by arrow, and I'm using that.
I'm also using p5.
So I kind of didn't do anything.
You'll be wondering what this guy do.
He's just using libraries.
So the main thing is that what I want to do is rebuild the graph of the program.
Okay, I'm not interested in actually writing the instruction disassembler.
I want to rebuild the graph.
That's what I want to do.
So pretty much my thing, which does not have a name yet, I call it kuchi, but it's not
a name.
Okay, this is the analyze method on Python.
Can you read there?
Is it too small?
I mean, you just need to see that it is very small.
This is the analyze method for the module class.
The module class is like the root node.
And it has first, it has a function queue.
So each time the disassembler finds a new function, it adds the address to a queue and
it continues disassembling.
And it finishes with the current function.
It takes a new from the queue.
That's a way to do disassembly, I think.
I don't know if there's another one.
It first initializes the queue with some user-defined entry points in case you want to add a specific
entry point.
It initializes the queue also with exports of the module.
Remember, it's a PE and the entry point of the module.
And then it pops, if you can read, one from the queue.
And it does not do much more than just say, function, analyze yourself.
And the same thing goes for function.
The function has also a queue, in this case a basic block queue.
It takes the original entry point and disassembles it, adding basic blocks to the queue while
it goes finding new shams and kind of thing.
So there is no magic behind this.
And I also want to show you two examples.
The second example of use, two case uses, kind of.
The second one will lead to, I'm talking fast because I only have 30 minutes, but I'm going
too fast.
The second one will lead to tell you what is that I wanted to try originally and made
me write this.
The first one, if you ever used PyDBG, I mean, you could do the same with immunity debugger.
I just don't use it.
I don't know why.
I don't use Oli either.
The first example will just put a breakpoint on every basic block.
I show you the code.
This is for notepad.
So it runs notepad and we have breakpoints on every basic block.
Every breakpoint is just hidden once.
I said them, like, do not repeat the breakpoints.
Okay, so it works.
The script to do that is less than a page.
This is the part where the module is analyzed.
I don't know if you get to see that.
I tried to increase the font, but I don't know how to do it.
Actually I have an idea.
Oops, that's bad.
That's worse.
Notepad might be handy.
Notepad, actually notepad is a great tool.
So you just create a new module, load notepad, and analyze it.
That's pretty much it.
When it comes down to setting the breakpoints, this is it.
BP set is for PyDBG syntax, so it's no more than just for function in modules, nodes,
set a breakpoint on function, entry point, and then for basic block in function nodes,
set a breakpoint on basic block, entry point.
So I'm setting a breakpoint, actually, to breakpoints on the first basic block of each
function, just to show you how to set breakpoints on each function, and a breakpoint on each
basic block.
What I wanted to show you here is that this comes from PyMate.
It's not part of what I did.
I just integrated a new disassembly in Shiny into PyMate at some point.
It can be used, of course, outside of PyMate.
You have tools to walk the graph and set breakpoints.
Just walk it.
You could query properties of instructions, too.
Each basic block has a list of instructions, so you can query instructions, too.
You can also...
I improved a little bit the renders of the graph hierarchy on PyMate, so you can dump
the graph to an XML and use it from other place.
So if I do that, we're going to get here an XML file that I'm not going to open because
it's very big.
But you can then, for example, you open it with this tool.
Do you get to see the graph?
Yeah, that's cool.
Is that better?
Okay.
This is the full call graph of the second example I want to show you.
I need to open it in IDA.
I don't know what it is, but it's kind of obfuscated and all this stuff.
So it's hard for IDA to disassemble it consistently.
Do you get to see the graph there?
Does anybody see the graph?
Yes?
No?
Yeah, you're lying because that's not a graph.
That's kind of a mess.
Sorry.
Tricky question.
Yeah, that's cool.
So this is the kind of functions you see when they're like shunt inside the same instruction.
So my original idea was to try what happens in IDA.
You cannot have one byte as part of two different basic blocks.
That's kind of weird.
But it is possible that there are two different execution flows that use the same byte as
part of two different instructions.
That's very common.
But in IDA you cannot do that because at least I don't know how to do it.
I mean, of course, if I could do it, quite likely.
But the notion of a byte and whether it's a code or not is very deep, I think, in the
design.
So I found a way.
I actually tried to work around that.
And I thought, okay, I may actually copy the byte somewhere else in the binary image and
disassemble it there and then tie it together.
And then I thought, better I write IDA.
I always wanted to do it.
So I wrote my own IDA.
It's not IDA, but at all.
IDA is the best thing in IDA, in my opinion.
So one of the best.
But I thought, okay, but if you are thinking the image is disassembled and you got it in
a graph, there is no reason why you cannot have the same byte in two different basic
blocks, right?
And actually, if I wanted to constrain that, I would have to add more code to check if
the byte was already in a different basic block.
So although it was my original intent to test this idea, it was automatic to get the result.
So if we go here, and I think this is the same...
This is the code graph.
This tool, I didn't use it.
I didn't do it, of course.
Like, I just did a very simple Python thing.
This is graph editing tool, which I love.
So this is the code graph for the whole program.
I could try to do it in IDA, but as I haven't tried before, I won't do it live now, because
you know demos fail always.
So this is the code graph, and if we go deep into the function, we get the full function,
the function graph with the basic blocks.
And I won't say it's prettier or anything, but at least it's different than what IDA
shows you.
And it may...I think it's at some point more useful, maybe, sometimes, for me, kind of.
And at least it gives you a different perspective.
We are used to see basic block graphs and function call graphs.
Here we have another one to play with.
I'm not sure which one is better.
We just have two different.
Like if we...let me see if I can find one example.
This is quite likely...no, no, it's not.
But at some point here, we will have shams in the middle of instructions that will split
the graph into different paths, and in this way of working, we don't have any problems.
Like...whoa.
Okay.
Like these shams, for example, that's typical for obfuscation, kind of.
You probably...if you go and try to find that on IDA on the other side, we see either simplified
because IDA actually does some simplification or outside the function.
So, let's see...2, 1, 3, 6...2, 1, 3, 6...whoop.
Hello?
Somewhere over here.
So, yeah, simplified.
Oh, actually, it's here.
This is the chain of shams.
We might see a shamp in the middle somewhere or not.
I don't know.
I had a good example to show you, but it's not there.
So, this thing has bugs.
Like for example, I was expecting that if there's a byte split, let's call it like that,
but then the execution flow shows again, I would expect to have two small blocks and
then the common part showing back in the same basic block, but it's not doing it.
I need to add one more check to do that, but I don't know.
It's possible.
So, I think I'm done.
Let's see what the presentation says.
Yeah, that's it.
This is how you set a breakpoint on every function on every basic block, just walk in
the graph.
This is kind of integrated into PyMA.
We never saw a release of PyMA after an official release, I may say, after recon, but there
is a source repository on Google, Google code, whatever.
I don't know.
Search for PyMA Google.
You get it, and then there's going to be a branch where this is implemented and it's
integrated.
I don't know how many of you use PyMA, but it used to be that you needed IDA and you
run a Python script on IDA to export the database to a PyDA file, and then from PyMA you could
open this PyDA file, and you had it as a graph pretty much as I have it now.
Now when you go open module on PyMA, you can open either a PyDA file or an XC file, and
if you open an XC file, it will disassemble it all within Python, whatever.
So thank you guys for wasting your time with me.
Thank you to the organizers for recon.
I love it.
Please try to keep doing it.
I will eventually release this on OSS.
I mean, releasing 80 lines of C is like, ugh.
But I will put it there at some point.
OSS, open source software at chorus.com.
That's where I work.
Since I have more time and I actually planned it, there are two more tools that I want to
actually go to the site.
There are a couple of very interesting tools.
The latest addition, just released this last Friday, is an iPhone debugger.
An iPhone, they call it kind of toolkit, whatever.
It's three things.
A debugger is based on whistle, but it's a little bit augmented.
It can understand symbols and whatever.
There's another tool to do TCP over USB when you have your iPhone plugged to the computer
so you don't have to go through the wireless.
That's abusing the mobile, I don't know what, DLL on Windows, like the IFAC with PH.
I didn't say a bad word.
The last thing they put on the toolkit is it's a thing that lets you set breakpoints on libraries
on iPhone.
It's quite likely that libraries are mapped read-only so you can all write the breakpoint.
This is just a patch to the loader to map everything rewrite.
You got to re-execute the application, but after that you can set breakpoints.
I think that's it.
Thank you very much.
Albert.