My name is Sebastien Doucet and welcome to this wonderful presentation on 64-bit imports for building and unpacking. Who am I? I'm a trainer and binary auditor for ETAC International Institute. I'm a moderator at reverseengineering.net forum. I'm the co-founder of video.reverseengineering.net. I'm also the evil moderator at crackmes.de and I'm a member of IR team. So what is ChimpReq? It's a 30 and 64-bit imports rebuilder. It's an improved version of ChimpReq. It fixes many existing bugs while it introduces new features and it was made especially for Windows and Windows 64 compatibility. And also it allows for an all-in-one version. So here it is. It's very similar to ChimpReq except that the buttons have been centralized and grouped together for ease of use and more logical stepping. So why do this project? ChimpReq is getting older. There are no public 64-bit imports rebuilder freely available on the internet right now. I was curious. I'm a reverser. It's what I do. And somebody had to do it and that person happened to be me. So why do this now? Well, imagine two guys sitting in their backyard. So they're talking to each other. Gee, somebody should be cutting their grass. Yeah, somebody should really be cutting their grass. One month later. Gee, somebody should be cutting their grass. Yeah, somebody should really be cutting their grass. Six months later. Gee, somebody should be cutting their grass. Yeah, somebody should really be cutting their grass. And one year later, they have suffocated. It's too late. So the moral of the story is don't wait for somebody else to do it in your place. Do it now by yourself. So quick overview, part one will be about the basics of unpacking. Part two about making a 32-bit import rebuilder. Part three, evolution to 64-bit and part four, two live 64-bit unpacking sessions. So no, batteries have died in my remote. Okay, so part one, basics of unpacking. We'll see how simple packers work. We'll see general unpacking theory. We'll also see the limitations of IMPREC, like the bugs with Vista and SLR, which stand for address space layout randomization. So the bugs with Vista x64 service pack one, which happens only in WoW 64 or Windows and Windows 64. So how simple packers work. So we start with a program that has an import address table and an import directory. The entry point usually points into the program code. And upon startup, the import directory tells the Windows loader how to build the import address table. So after a file has been packed, the program has been encrypted and the import directory has been destroyed. Whoops. Okay. So then you can't really see in the purple, but there's the unpacker stub, then comes the unpacker IAT, then the unpacker import directory. Then the packer moves the entry point to the unpacker stub. Now upon startup, the import directory will rebuild the unpacker import address table and the unpacker stub has code to rebuild the original IAT manually. Now the program gets decrypted. The execution returns back to the original entry point. This is when we want to jump in. So we dump the program and create a new import directory to tell the Windows loader directly how to rebuild the original import address table without going through the unpacker stub. So let's see the limitations of IMPREQ. So on Windows XP or Vista without ASLR, you can see that everything is normal. Everything is valid. But if we switch to Vista with ASLR enabled, we can see that some modules were not identified successfully. This is because, yeah, as we can see here, some imports could not be identified too. It only happens in three specific modules, GDI 32, Kernel 32, and User 32. So if we look more closely, the first group is without ASLR. The second group is with ASLR. As you can see in the top group, all the sections mapping are contiguous. And on the bottom row, all the sections have gaps between them. This is why some imports could not be identified because IMPREQ didn't expect ASLR. It was made a long time ago. So now let's see another problem. Vista x64 service pack 1 with Windows and Windows 64. You can see that one module is invalid. If we look more closely, this is because NTDLL, DefWindow, ProcW has not been forwarded, unforwarded successfully to User32.dll because we can see with the context looking at the others that all the others are from User32 except that one. So let's see the normal method on XP or Vista for this API to be called. It is a simple call, really. It leads to a function. Really simple. Now let's see the normal method in Vista x64 if everything was working successfully. Statically, we can see that the entry point of the function, the function rva, are kind of normal values. That leads to what is called a forwarder string which forwards the DefWindow, ProcW to NTDLL, DefWindow, ProcW. This is how it should work normally. Now let's see the Microsoft drunk on the job method that was introduced with Vista x64 service pack 1. So this is a little export viewer that I made to display the same value that were in the previous table to see their state during runtime. We can see that the ones with the z's beside them, the entry point of the function is even outside of the User32.dll. The rva is bigger than the image size even. So let's try to follow it. So we're starting from the IAT. If we look at that address, we can see that we land directly in NTDLL.dll. Now we have jumped to an offset. We see that offset. We follow it to User32 again. And this is what the graph looks like. It's exactly identical to the one that was not forwarded. So it ends up at the same place as Vista without ESLR. It could have been called directly. I don't know why they did all this funky forwarding thing because it was really not needed to go to NTDLL and back to User32. And once again, imprec was made a long time ago, so we didn't expect to see something like this. Next, part 2, making a 32-bit imports rebuilder. So we'll see tool-elp32 versus PSAPI. Planning efficiently to save time. The five steps of the program, dump, IAT auto-search, get imports with deeper part about un-forwarding, show invalid, fix dump. So now let's see tool-elp32 versus PSAPI. Imports are two groups of APIs that are used to look inside the memory of a running process. They both do about the same thing, but there are some nuances, especially when you go through cross-architecture. So now let's see some imports from tool-elp32. There is create tool-elp32 snapshot, process32 first and next, module32 first and next, and tool-elp32 read process memory. Now that's a quick overview. I paste some stuff from MSDN, but it's mostly for offline references. So create tool-elp32 snapshot requires only two things. Here we can see it requires a process ID, which is optional, and a flag. But also if we look at one of the options, this one right there, it says includes all 32-bit modules of the process selected in th32 process ID in the snapshot when called from a 64-bit process. This flag can be combined with th32cs snap module or th32cs snap all. But it is wrong. It doesn't work. Really, it's been confirmed by other people, so it's not only me. It really doesn't work as advertised. I want my money back. So create 32 snapshot returns a structure. Now returns the handle, sorry. Now after that we can run process32 first and next to return a process in th32 structure, which is, as you can see, as many, many, many members of that structure. It's quite complete. Also there is module32 first and next, which do the same thing while returning a different structure, module and th32, which has a lot, a lot, a lot of members too. Then there comes to th32 read process memory, which is really, really similar to read process memory, except that read process memory requires a handle to the process, and th32 read process memory requires only a process ID. So it's basically the same thing, depends on what you're working with, but they give back the same results. So let's see the PSAPI API family. Enum processes, enum process modules and modules ex, get module information, get module base name, and finally get module file name ex. So let's see, enum processes function returns a, retrieves the process identifier for each process object in the system. It returns an array of process IDs. Now each of these process we have to call under API to see those, and the modules inside. So it's enum process modules, which returns again an array of handles, of HModule handles. Enum process module ex is the same thing, but it allows for a flag, and the options work on this one, not like the previous one. Also there's, it's possible to get similar structure, but it's really smaller. It has less members than the one we saw previously, like module info 32. So this one, get module information, returns a module info structure, which is really, really basic. If you want more info on top of that, you have to call other APIs. So it's get module base name and get module file name ex, which return the file names, which were available in the first structure with the tool help 32 snapshot. So cross architecture compatibility. As we can see, when they're all working in their own architecture, everything is all right, there's no problem, but here's where I said the trouble comes in, where you go cross architecture because the options listed in create tool help 32 snapshot don't work. And this one, create tool help 32 snapshot is only partially working from x86 to x64. It can list all the modules, but cannot see inside what is really, what they're really. They only see the WAV64 emulator, which is a 64 bit process. So basically x86 to x64 is almost impossible. But when you come to x64, x86, it's possible within a process model zx, but not in a process model. So why not use in a process model zx all the time? Because it's a Vista only API. So who wants to make a program that works only for Vista? So in the first version I made, I used create tool help 32 snapshot because it has the widest range of compatibility. So let's see, planning efficiently to save time. Two single architecture versions, x86 or x64, two each is on. The API is used where the create tool help 32 snapshot for the best OS compatibility range. It allows for common projects, source and headers. So cross architecture all in one version. That is for x86 and x64. So it's made from a different x64 project. It requires a 64 bit OS. And the use of in a process models and ex. So basically it runs only on Vista x64, which is even less people than the ones that run Vista. So it's a small crowd. So let's see the steps of the program. The first step is dump. It is used to copy the memory area of a process to a file. When the process has reached its original entry point, each section is dumped individually. Each section raw size must be realigned from file alignment to section alignment. Raw address matches virtual address. All sections are made writable by adding the flag image section memory write. Virtual protect ex to change the process memory to page execute read write. So we can see before and after dumping, same program. Some values have changed. The virtual address is now identical to the raw address. And the virtual size is identical to the raw size. Step two, IAT auto search. So it's a binary search looking for indirect call OPCodes, like the move that are followed with a call to a specific register. Also binary search looking for direct call OPCodes, like call, jump, and push followed by a return. It's a binary search ignores relative calls, starting from the image base or entry point. The first found call must lead to a valid import. So we have to search up for the beginning of the IAT, search down for the end of the IAT, just like trying to identify a weird object in the dark. So we can see a program that starts with the standard MASM opening. So as we can see, the first call is a relative call. So it is discarded by the program. The first call to be registered is the jump in the import stub. We can see that FF25 was in the list of OPCodes that were searched for. And then we just have to get this offset directly. It can be applied directly because 4020.00 leads directly to the import address table. So step three, get imports. It's to identify the elements of the IAT in the specified range, exactly the contrary of get proc address using custom made reusable functions, like get proc model name, get proc name, get proc ordinal, get proc name and ordinal, get proc info to get everything at the same time, and also on forward. Now let's see what is on forwarding really. The entry point of the function is not code but a string. Imports are forwarded for compatibility between all the different versions of Windows. Everything is recognized by IDA to load this particular DLL. In the import address table, IDA identified it correctly. So as you can see, there are many false positives, like the last one here. This is create process w, which would be called from shelluapi.dll, which is hardly ever used, except when there are false positives like that, because all of these, they're false positives. They're never, never used in that way. But if you take all the possible on forwarding and say that it, take it as right, that yes, it was forwarded, then you get real funky weird stuff. So forwarding by ordinal is about the same thing for imports or exports that don't have a name. So as we can see, this one would be forwarded to shunimpl.sharp177. Step four, show invalid. It's to display the unidentified IAT entries. It's a text search to the interface. It's to check all imports one by one for validity. It's the simplest step to implement. Step five, fix dump. To recreate the import directory to satisfy the windows loader and restore the original import address table. It's to assemble structures that point to each other, like image import descriptor, emerging import by name. It's like gears and a clock. They all have to work together. If one is off, then the whole thing breaks down. So let's see how they all relate to each other. We start with an image import descriptor, an import address table, and image import by name structure. So from the image import descriptor, the loader gets the address of the element of the IAT. Now if we check the file in AIDA, it will say on the static analysis that there is nothing there or it is unknown. But it is in fact not true. There is something there. It's a pointer to an image import by name structure, which is red, and then overrides the entry of the IAT. Now there's an alternate method to do all this, because sometimes you need to know which image import by name structure relates to which entry of the IAT. So there is the int name array. So starting again, the loader looks at the image import descriptor, then goes to the int name array, which is exactly the same value that would be in the import address table before startup, but it won't get overwritten, so it's still available after that one has been overwritten. So the int name array points to the image import by name, and the image import descriptor still portable. So there are two alternate ways. Both are functional, but if they are both implemented, they have both to work. You cannot have one that works, one that doesn't. That makes all both of them don't work. So part three, evolution to 64-bit. Changes from P to P32 plus format, the changes in the import rebuilding process, the planned improvements in the near future. So let's see the changes. All registers extended to keywords, like EX becomes RAX, ESP becomes RSP. There are new registers that were introduced, like R8X to R15X. All DLLs used must be 64-bit. The base of that has disappeared. It's not part of the optional adder anymore in 64-bit. And there's also a new calling convention for APIs. So let's see the changes in the import rebuilding process. Not really many changes. It's really similar to 32-bit. IAT elements are keywords. Pointer to original first chunk is a keyword. Image base is a keyword. Evolution and LURs are now stored as structures in the new P32 plus exception directory. So all the SEHs are stored in the adder. Planned improvements in the near future. A resizable window, DLL support, integrated disassembler, auto trace support, and custom tracing plugins for the one that the auto trace doesn't support. Now let's see. Part 4, live 64-bit unpacking sessions. I'm going to be using IDA Pro Advanced 64 and Chimprex 64. The first example is Empress version 1.07. It's a simple UPX-like packer. And the example number 2 is Armadillo 64. It's packing heat. And using standard protection, only a level beyond minimal. So it's really basic. It only touches code a bit. And the imports, it doesn't do anything like nanomites or even where there's stuff like that. And I'll just, yeah. It uses only imports location and emulation. So let's go to the live portion of it. Okay. Let's start with Empress. Unpack me. I drag and drop it into IDA. And it gets open. We can see here portable executable for AMD 64PE64. Yeah, it may have been packed, yes. Okay, so now we're here. Like I said before, it was, that it was a simple UPX-like packer. What we would expect to see a POP-A, a PUSH-A and a POP-A later, but these two instructions have been removed from the 64-bit instruction set. So that's why all the registers have to be pushed individually onto the stack. So we, okay, debugger, process options. Need to set local remote debugging for that. Yes. Server. Okay. So now we can execute the file. Run to cursor after the first push. Yes, I'm really, I really want to debug it. Okay. So now the first register has been pushed. So now usually I would say this is the ESP trick, but we have to get on with our time. So it is now called the RSP trick. So this is what was pushed onto the stack. We had a breakpoint. Okay. F9 to continue. Breakpoint has been hit. Yes, I want to see some instructions. We're at a jump. So let's follow that jump. Yes, again, I want to see instructions. This is the entry point. That's it. But before I made my program, I tried unpack-mes like that in 64-bit. I would say I've reached the original entry point. Now what? Which program do I use to dump it or to rebuild the imports? After searching for a few hours, I realized that there weren't any. I made one for myself, and that's why I'm here. Okay. 1B50. Now I switch here. Unpack-me. The new original entry point. IAEA auto-search. Get imports. All imports are found. Dump. Unpack-me. Fix dump. That's it. I could have made it seem more difficult, but that's really as simple as it is. And it works. Now let's go on to the second example, armadillo, which is a bit more complicated because it is more of a protector than a packer. So I'm going to close that. Sit. Don't save. Hey, yippee. Chal karma. Boom. Again, debugger options. They were saved. Everything is good. So we start execution here and the first instruction again. Even to cursor. Yes, I really want to debug it. So now we're here. Armadillo is a bit more complicated. It uses different threads to unpack the process so we can identify the original entry point by looking at where those threads are created. So let's go to view, open the view imports to see the imports of the packer. Those are not the ones that will be rebuilt. So we search for create thread. Boom. Then we go into create thread. Turn it into a function by pressing P. There we are. So I put my break point on the call here. Add break point. It's a break point. Let's make it hardware, un-execute. Okay. Play. Boom. It has been at once. F9 a second time. Now we can see the next screen from Armadillo. No I didn't pay for it just for this demo. So now we're back at the break point. We can remove it. Now we continue to follow the program. Blah, blah, blah. Yay. So there is code again. P to create a graph. Going down we can see that there is a return. But this is one we probably won't reach it. So there's two call rax just before that. So let's put a break point there. To see which one will get it. F9 again. Call rax. Yes, I want to see instructions. This is the original entry point. But that's not the whole job. Let's see what the result Chimpec gives. Now starting Cal Karma. 18 CD0. IAT auto search. Okay. Get imports. Yeah. It's a big mess. Nothing has been identified really. There are fake stuff, fake separators between imports and emulated imports and relocated imports. So we can't rebuild the IAT as it is right now. There are gaps everywhere. So now let's find the let's go to the original import table. There. Now we need to reconstruct all the offsets by pressing O. Because they have a script to do that in 32 bits, but it couldn't be updated to 64 bits. It kept asking me what is a keyword. So there's a bunch of them to do. Yeah, there's really a lot of them. Is that the end? Yes. We've reached the end. So now we see we have to rebuild all the imports manually. So we can let's check number of OPCode bytes. Let's put 10. Okay. So let's check the last call. This leads to create file A from kernel 32, which wouldn't be, couldn't be here really because the current module is MSVCRT. So this is what is called a fake import or a fake separator. So we copy part of it. It's it. Patch program, change byte. Okay. Now the process has been modified. If we get imports again, you can see that the last one is not recognized anymore. So ChimpKick reads the process memory instead of, yeah, it reads the process memory every time. All the APIs could be modified manually, but I prefer to do it in IDA directly because of unforwarding because it depends on the context on the API that came before and the one that comes after that. So we need to go through all of them and fix them. Now let's see. Next one would be this. P. No. Ah, this is why I didn't find anything because there really isn't anything there. It's a fake separator. We're switching from user32 to MSVCRT. So this one can be removed also. Patch program, change byte. Okay. Going on up, see another bad import. Look at it. And it's called set last error. See the first four bytes. Back here, edit, patch program, change byte. It's always the same thing for every one of them. Okay. No, it's not the last. Okay. Anyway, this was the wrong one, but that's important. Okay, let's continue. User32, and we can... This is why I didn't want to put an auto tracing plugin because it can go wrong much more easily than it can go right. So if we can see in dialog. Yeah. Okay, next one. P. Dialog box by MW. Patch program, change byte. Okay. Going up again. The next bad one is this one. You can see get window text W. So it's not really more complicated than that. So for every import, we have to use IDA to find which one it really is because some have been relocated, which is called import relocation. Patch program, change byte. Okay. Here we have shell32 alone and user32 after, so both of these are fake separators. Some are funny because some are even like this one. The third call leads to the first one, so it always goes in a loop. You can get very deep into it if a tracer was badly made. Patch program, change byte. To zero. There we go. So this one too. It did. Patch program, change byte. Next, global unlock. Next one. This one's right there, create thread. So I don't have it in the, I can't copy the code and the bytes directly, so I need to go into the list of imports to go to the unpacker, import address table and get those byte there. Create thread is here. It did. Patch program, change byte. Create thread. Next one. Close and dog. Okay. Going up again, we see kernel32. This one is global lock. So it's a bit repetitive, but it's really the most efficient method to do it. Patch program, change byte. We see this one here. Global unlock. Going up again. Terminate process. Okay, kernel32 and tdll. So we'll have to see these ones too to see what they are really. So this one looks like a big mess. And big mess rhymes with get proc address, so this must be it. So we have to go into the origin of the, into the import table of the unpacker. Okay, this is it. Patch program, change byte. If I look at this one. This one is load library A. Now we can deduce directly that the one between kernel32 and gdi32 is a fake one, so we can put it to zero. Another separator here. And finally we should be done. But I messed up on one. This one, this one I got it wrong. 1310. Actually, E5DO. All right. I– So, inague. The image base is... Go to entry E5DO. Okay. Anyway, I messed up on one import, but it shouldn't really make a difference. Now if we look at the process again, we can see that everything is valid. And even the one where I messed up, the set last error, has been recognized as valid. Even if it's in the middle of user32.dll. So let's dump the file. Fix dump. And if I didn't mess too much on one import, it should be running perfectly now. Yeah. Yeah, that's the import I messed up. That's bad. Okay, let's try to get it. Okay, I'll replace it to something else from user32. Random thing should work. Okay. So we do a bunch of valid nodes. They have all been put back together, CalcArma. I hope that it works, and if not, it looks bad. No. Anyway, if I didn't mess on that one import, it would have been alright. So go back to the presentation. So I'd like to say thanks to all my beta testers from AR team and Team SMD. And even bigger thanks to all the audience, the RECON staff, and the speakers. And ChimpCREC will shortly be available from the ITAC website at http. www.itech.org and Woodman's collaborative RCE tool library. Questions? Yes. Yes. Yes, it could be done by, let's say, turning ChimpCREC into a debugger and reading the context directly of the process. Yes, it could be done. But I tried to make it as non-intrusive as possible. No options. Stamp. Open. Maybe it will work. No. Anyway. So because I messed up on one import, that's the only reason why it doesn't work, because it worked before. So that's why I didn't want to include a tracer, too, because if a tracer gets it wrong and you don't know why, and you don't know if it is wrong or not, you'll probably get a result like that and have to search a long time to see which API that was traced was bad. Other questions? Thank you.