Can everybody hear me? Woo! All right. All right. I'm just going to start now. I'm Cameron Hotchkies. I work at Tipping Point's DV Labs. Most of my job is vulnerability research analysis. I work with the ZDI program. And we have a blog up on dvlabs.tippingpoint.com, so we'll have updates in any of our programs. Hopefully all my slides and any code that I mention will be available later this weekend. So by Monday you'll be able to pull it all down. In the past, I've written and contributed to Pedrum's PyMA reverse engineering framework. Before then, a long time ago, I created Absinthe, which is a SQL injection tool. I don't do web stuff anymore. And on the side, recently I've started up a mailing list for OSX reversing. It's very niche, very small, very low traffic. But it's kind of nice, and there's some pretty smart people already on the mailing list. So if you're interested in this kind of stuff, feel free to join up. And once a month when somebody posts, you'll be able to read it. So I'm going to cover, this is basically what I'm going to be covering today. File formats to tools, disassembly patterns that aren't necessarily related to specific technologies. Briefly look at Carbon, predominantly look at Objective-C, because that's the main programming focus that Apple pushes on people. And then I've got a quick list of links, but it's just for your reference later if you pull the slides down. So applications in OSX are stored in a directory structure, commonly referred to as a bundle or packages, and Finder, the equivalent of Windows Explorer, will load up as if it's an executable itself when it's really a directory structure. But it's a self-contained directory that has the binary, various supporting files and resources. And this is what it looks like if you were to break it out in picture form. Program.app would be the name of the program itself. The program obviously would be something of your own choosing. And everything with a square or a solid border is required. Everything with a dotted border is optional. And white is a directory, not white is a file. So the contents directory, which is the only thing in program.app, it contains a package info file, an info property list file. Then there's a macOS directory, which has the binary. It suggested that the binary name should be the same as the application directory name. It doesn't have to be, but it just commonly is. And you have the resources folder, and that's where you would have the sound files, media files, icons, and nib files. Nib files are an interface, like the GUI layout stuff for coding on OSX. There's also XIV, which is similar. Then you have optionally frameworks, plugins, shared frameworks, shared support. Info.plist, I kind of blew through that because I have this slide here explaining it. It's an XML or binary list of application properties, including the author, the copyright date, just various stuff that's going to be loaded in to the application. It's fairly well documented by Apple. I'm not going to go through all of the fields here because it's a waste of time. And if it's in binary, there's the plutil command on OSX. A lot of the commands that I'm saying are stock. Some of them may be included with Xcode. If you're going to be reverse engineering on OSX, I'm going to assume you have Xcode installed. And if you weren't planning on it, just do it. It takes like 15 minutes. Plutil will convert between the binary and the XML format. And I just added that thing from the man page to, it's interesting with some of the Apple stuff because they just put random crap in their man files. So the other file that is in application bundles is the package info file. Apple doesn't seem to be sure if it's required or not because in some docs it is and in some docs it says it's optional. Generally it's there and generally there's nothing really important that's not covered in the property list. But you're going to get APPPL which is the application bundle tag, four bytes of that and then four bytes of the signature. So they say for like the text editor it's TTXT. But most of that, like I said, is going to be in the property list file. So the actual binary is going to be a mock O binary format. That's the standard binary format on OSX. It's got the magic number feed face unless it's 64 bit then it's feed face. And you also have FAT or universal binaries which is the code for PowerPC which is what Apple's were running up until the past year or two ago. And it's going to have the Intel stuff which is all of the modern laptops, desktops. And the FAT binaries have the cafe babe magic number. So you might be like, hey, I've seen that before, Java or Java depending on where you're from. Yeah it is the same. The main thing is that Next actually predated Java so that's, they can pull it off and that's where it came from. The other interesting thing if you get into reversing on OSX you're going to Google for mock O occasionally which comes out as a macho and just random crap. So expect some interesting search results. I don't even know what, I read that, looked at that paper, I still don't know what it was about. I think it was like auto-generated. So, the mock O file format is fairly well documented by Apple and it was fairly well documented by Next which created it and it's been documented by GNU. So I'm just going to quickly go through these. This is a text segment for the mock O binary. Anything with an asterisk in this chart was documented by Next at one point and is no longer being documented by Apple. So it was probably deprecated and removed at some point but for completeness I'm just including it in case you're looking at really, really old binaries like the original copy of Doom. But so it's pretty straightforward. There's some C++ stuff in there. The 8-byte literals you're going to see all the time. The C strings you're going to see all the time and the text obviously you're going to see all the time. The interesting thing is that text is not the only place where code is going to be. There's coalesced text sections as well. Ida will pick it up so just aim for the blue and you're good. Then you've got the data segment, standard data segment. I'm not going to waste a lot of time with this. Then you've got the Objective-C segment. This is where all the Objective-C metadata is stored and in the next in Apple guides you get this statement which basically says we're not going to document it so good luck, figure it out. Luckily it's fairly well named so like class references, CLS refs, it's pretty straightforward and the format doesn't take too long to figure it out on its own. And three people at least have written Ida scripts to clean it up so you don't have to spend a lot of time working on it unless you just want to figure it out for the sake of a boring Sunday afternoon. Just because I didn't have anywhere else in my slides to put this I'm going to mention VM map. It's a kind of interesting tool. Tiller pointed out this morning for me to take a look at. It's just basically a memory map for any process that's running and it will give you permissions on whether you can read and write to specific segments and instead of loading up a binary and a debugger and trying to figure out which address got mapped where, it's just a quick way to load that up. You can just man VM map or Google it. So right now I guess we're in the tool section of my slides. First thing you're going to need when you're reversing on OSX is a hex editor. Hex Fiend is pretty cool. It's open source and the code is very well documented so you can just go add random crap to it but it's a text editor. I'm not going to waste a lot of time on that either but you can grab it from ridiculousfish.com. OXI is another hex editor. It's not open source but it's free and there's a plug-in architecture for it so you can add plug-ins for representing all the different data types down here at the bottom. Hopefully that's clear. O tool, stock install of on OSX and it's the equivalent to object dump. If you just do dash OTV, that's going to give you a disassembly straight on the command line, a disassembly. You can also do dash L to list all the required libraries. There's a whole bunch of different flags for it. Just makes your life easier if you're stuck on the command line and you don't have any other tools available. And a lot of people have built on it. OTX is one of those tools. It will basically run O tool for you and then add symbols. I've put the symbols in blue so that they're a bit easier to read or identify and that makes, it just basically cleans up the code so you can see where the objective C statements are in the actual dump from O tool. Last dump, similar to parts of O tools but the difference is that it won't dump code, it'll just dump the headers. So I think those flags are wrong. But this is an example of it. It's dumping the straight objective C source code. So if you just want to see the data types, the signatures, then it will be right there. Ida, Ilfax spoke earlier. If you still don't know what Ida is, that's kind of scary. Oida Pro on OSX runs great on parallels. I use it all the time. There's no speed issues. There's a Mac OSX version for the console as well if you want to do batch mode. They're both commercial products but they're both worth the money so for sure grab those if you don't already have them. And then debuggers. There's a couple scripting debuggers available. Charlie made some additions to PyDebug, a part of the PyMA framework so that it runs on OSX. Stock installs come with GDB, that's with Xcode which comes on your install disks. PyGDB is available on Google Code. Vtrace can work on OSX. Western and Boschamp are going to be releasing Redebug soon or REdebug. So you've got scripted debuggers and GDB based debuggers. And I'm not going to spend a lot of time on this because it's being presented in two days so I figured I'd give it a shout out. So calling conventions, standard call for most binaries, everyone I've looked at. And the main difference coming from a Windows perspective is that instead of pushing the arguments onto the stack, that it's going to preallocate the stack space and just move into the preallocated stack variables. You can write scripts to automatically rename these to sensible names like argument one, argument two, selector recipient. The one thing you're going to watch out for is the stack delta. Sometimes it changes midway through a program so if you're just automatically looking at the frame and renaming those variables, you'll get bitten so it's better to just rename them as you see them. Also there's local addressing for position independent code. I refer to it as an anchor point. In functions they'll use it for jump table referencing. They'll use it for data referencing. And so sometimes Ida will get a little bit confused as to what that is but I've written a couple scripts to just automatically find the anchor point or table of contents, whatever you want to call it. And it's fairly simplistic but it's going to be in the EBX register for the most part. And this is how they generate it. It's going to be early on in a function, a call to this function which is two instructions just returning the program counter or you'll have it directly inlined which is familiar for if you look at any shell code that's doing something similar to this. So here's an example of the anchor inlined at the top of the, you can see the call plus five pop EBX and it's not actually being used in this example but it's there. So they're probably going to do something with it later. Actually that is used in this example. I totally just lied to you. Right underneath where they make the anchor, they have a reference to it and then I've added comments. You can see at the NS auto release pool is the reference after it's been dereft twice. So carbon is the 32-bit framework for interacting with the OS X system libraries. It was descendant of the Mac toolbox, the original Mac framework stuff but it's, Apple sort of been downplaying it recently encouraging people to use Cocoa or Objective-C as much as possible but C programmers and people porting applications over from Windows that want to do it quickly tend to use Cocoa and C-based application. Interesting thing is if you can tell if it's a Windows port simply by searching for HWND which makes no sense on a Mac program but it's all over Windows code and they will copy the names over. HI and CG are common prefixes to libraries that are written in carbon but the bulk of stuff that's made and people talk about when they're talking about coding on OS X isn't done in Objective-C which is, it was created in the 80s by Stepstone and then Next licensed it out. It's object oriented inspired by small talk but it's just a small extra little bit on top of C. There's not a whole lot to learn other than the initial calling and defining of the methods and then it's pretty straightforward. But the biggest change from a reverse engineering standpoint is that functions aren't called and if you looked at Objective-C it looks like C where it's like you've got functions all over the place but it's just a message sending architecture and so all of a sudden all of your code cross references get mangled or they're all direct calls to object message send so you've got all of your cross references merging on one spot and never coming back. You can get around that though but so that's the biggest thing for looking at Objective-C in a binary. Unicode strings are standard but they actually store most strings in the C string section so they, it's a two byte Unicode string in the programming in the develop, from the developer standpoint but it's actually just stored as a null terminated byte string and libraries refer to as frameworks interchangeably. So the framework section you have a large set of frameworks to call from. NS and CF are common, commonly prepended to the frameworks Next Step and Core Foundation respectively. Two letters tends to be a common prefix for most Apple stuff and there's a toll free bridge is what they call it that NS and CF can be treated back and forth as similar objects and as I mentioned earlier the API is called Cocoa if you're coming from the Objective-C perspective. AppKit is a collection of the frameworks that are around GUI, GUI designing stuff. There's other stuff that's not necessarily GUI related but that's the bulk of it. The iPhone doesn't use AppKit so you're not going to see any AppKit code there. You're going to see UIKit. So AppKit's NS, UIKit's UI. So this is actually Objective-C. This is what it looks like. It looks fairly straightforward. The only main difference is that you have these, I don't know if the red comes out well but the square brackets, there's selector decorators telling the compiler that it's Objective-C code and the object that you're talking about is referred to as a recipient. The names of the arguments which are always put in proper order included in it, they're referred to as the selector. So I've been throwing these words around. This is what I'm actually talking about when I say that and the arguments if you hadn't guessed that. And that's the Objective-C method calls. So what that really comes down to be is a wrapper. It gets translated into a C function call to this message send function. You can see the recipient, the selector is concatenated as a single string and the arguments are just attached to the end with an ellipsis. You have different types of message send calls. There's the super class one. There's the floating point return one. This one, we'll go back. ID actually is object pointer. That's what it means in Objective-C. That's one of the small other things that it changes. But if it's a floating point, obviously it can return floating point numbers. And the main difference is that there's no, if it's a structure, there's no return value. It's passed in through the arguments and that's modified. And that's usually when you'll see your stack delta is changing mid function. So there, and then superstret's pretty straightforward. So this is what the message send call looks like when it's in assembly. Like I said earlier, you can see that the arguments are being moved in to var 38 and var 30 which is just going to be actually ESP and ESP minus four. So those are two message send calls. You can easily write scripts to clean it up and just dereference everything you possibly can into strings and all of this information is like available if you just hover over it in IDA but if you want to just make it so it's automatically all there, I've got a couple scripts to do that. And then it's really easy to pull out the actual Objective-C function calls and what's going on. Now the, when I showed you the Objective-C section, this is one of those sections, a snippet from it. This is the instance section, instance methods for a download delegate class. I've cleaned this up a little bit. Normally it's just the structure there but I've propagated some of the names. So you can see the first thing is the actual string which is in blue immediately after it. Then you have this offset to what looks like gibberish and then you have an offset to the actual function that it's talking about. That gibberish stuff in the, after it is a typing coding and what that does is it tells the return about the value type. It gives you the type for every single argument, the structure, the signature of the function and you can use this to propagate more information to your functions when you're trying to just clean them up. So here's an example from the previous page. First thing is going to be your return value which is void and the fact that you have an object being passed in is telling you that it's going to be an instance method which is what the dash represents. If it's a plus it's a class method and object to see. Then your colon there is going to indicate that it's a selector. There's actually two of them but I didn't figure out how to make latex show it up properly. So, and then you have another at symbol which is another object. ID is just sort of a overarching object pointer. It could be specific but it's only ever going to store it as an ID and then you have your second object there so that's the second part of the selector. And so the stack offsets are indicated. That's what the 0, 4, 8, 12 and 16 are used for and so that tells you the order of where everything's showing up and you can, if it's not implicitly defined you can figure out the sizes of the variables but most of them are going to be implicitly defined anyways. The return value says it's at the stack offset 16 but it's for the most part going to come in through EAX unless it's one of the structures then it's going to come in through the value being passed in. So the only reason I can think that they used the 16 is to give you the size of the last argument. So memory management in object to see is a little different than C. You can still use all the C constructs but you also have the alloc call and the init call. Well the init is not really memory management but they usually come one right after an other. So that's how you allocate something. Copy is also used to allocate memory. Both of those indicate that something is being added in. Then release, the way ObjectiveC handles memory management is that it is a reference counting language so, or reference counting runtime. So what release does is remove that reference that was created at some point and once it hits zero it's deallocated. If you get something passed in from another function and you want to keep it or if you don't want it to like, if you want to just add another reference you just hit the retain, call that and it's just increase up the references by one so that it won't automatically get cleared out by some extra auto release stuff which is right here. This auto release pool you're going to see a lot of this and the way it works is that it manages all of the references for objects that are being passed in from child calls so they can get cleaned up properly even though they've gone out of the scope of their creating function. Generally these are going to be at the top of the, you're going to have the declaration at the top of a function and the pool release call right at the bottom of a function but it's also used in loops. So at the top of certain loops that have a lot of calls in them you're going to have another call to their release pool or in auto release declaration and then you'll at the end of the loop you'll have or any place where it will exit the loop you'll have this pool release so you can use that in assembly to figure out where the loops are actually structured. Garbage collection was added with Leopard latest release of OSX in November. Classes that have it are going to have a finalized selector if you see collective exhaustively or collective needed that's probably going to be calls to the garbage collector forcing it to run. The big thing with garbage collection is that it's not on the iPhone because that would, it takes up too much memory in the iPhone, has a smaller memory footprint. There's a bunch of stuff that didn't make it into the iPhone but so you're probably not going to see much garbage collection there but for regular desktop stuff it automatically will clean up a lot of people's stupid mistakes. Categories are kind of interesting as well. It's a construct for Objective-C which allows you to either override or add methods to a class that you didn't create so you don't have to, the source code you don't have to have permission to do it. You can just sort of stick it on and force things to do, things that weren't actually intended to do. You could override the default like NSString so that whenever somebody calls length it's going to add seven for some reason because you think it's hilarious or whatever. But the interesting thing about categories is that all of a sudden assumptions that you might be making about certain methods that are being called for framework classes, if there's a category in the binary then it's going to have a different behavior. So oddly a lot of people who when they first look at the Objective-C sections they'll look at the category ones and just propagate the names. You should always try to keep in mind that it's going to be a category so that way you don't make any assumption as to, like if you see NSString length but it's been overridden, you have like NSString cat length and then you'll know okay there's something different going on there and I'm not going to just assume that it's returning. Another extra thing to point out is timers. We're all familiar with timers. You have a neg screen that comes up every so often. You can use standard C stuff for doing it but there's also two things in Objective-C. NSTimer which is the older timer object that will wake up every so often and NSOperationQ. NSOperationQ was added in 10.5 as well and it's suggested for multi-threading especially when you want it to happen repeatedly and a lot of those will be used on neg screens and I talk a lot faster than I thought I did. Wow, because we're already in the references section. Okay, well a lot of OSX reversing stuff was done by Nemo. He isn't doing a lot of OSX reversing anymore but he still has all of his papers and presentations available on his website. Also, It's Me and File Offset both wrote stuff for IDA that will propagate a lot of that metadata into the functions that I was talking about. It's Me wrote it in IDC and File Offset took a really cool approach by taking the OTX output and writing a Lua script that will create IDC and you can automatically run just adding the decorations into it that way. And these are some extra links that were suggested by the XSO mailing list if anybody is interested in getting into this kind of stuff. And coming soon also is Dino and Charlie wrote a book or are writing a book on OSX hacking and that will include some interesting stuff as well. They've got a couple of tools that they'll be releasing once this comes out so I figured I'd give them a big ups for their book and apparently if you preorder now you can save $13.50. That's 27% off the list price. So, since I spoke fast you guys got a lot of time for questions. Does anybody have any or we're just going to go to the party? Dino. Absolutely. I'm not going to change my resolution though. We're just going into the iChat. Folder here. So, I don't know if you can read that but trust me you want it this small. It's like the slash applications in the iChat.app folder and then it's contents macOS iChat. And I went one too far. So, that went pretty fast but it's kind of fooling you because if you go through and do it through less you're going to be hidden in space for a very long time that just keeps going and going and going and going. So, this is what Dino was talking about. When I had that screenshot of the class dump that took me about 25 minutes to like find a nice little one that would fit in and make sense in the slides because it's just and it's propagated from everything. So, anything that this possibly calls is going to be coming through this. And here's actually I do running in parallels if you're curious about it. This is I wrote a script as well to clean up stuff and this is what I was talking about with renaming the variables. I just refer them as recipient and selector and they're just named on a per case basis instead of just at the top of the function. But you like I've added comments here that dereferences everything. I always use A but it doesn't really mean anything. It's just a placeholder variable so I know what's going on because it's going to come in through EAX. And let me add something else. Here's basically a list of this is my sort of toolbox for item manipulation object to see binaries which I will upload to the DV lab server tonight. It's by no means complete. It's sort of a work in progress but as I just come up with silly little Ida Python stuff, I just added into there. So, there's even sections of it that are totally commented out where I'm like I was totally wrong when I was doing this but I just sort of keep it as a track record of what I was doing. So, anybody else? Questions? To be honest, I haven't really looked at much PowerPC stuff. Dino, have you? The question was are there any noticeable differences between PowerPC and Intel based Objective-C? Anybody else? Bueller? Bueller. All right. Well, see you guys later tonight.