Wednesday, January 16, 2008

decompiling c and c++ programs in Linux

Suppose you are working on one project and suddenly due to hdd crash you lost all your source code and suppose that you didn’t use any source code management softwares such as subversion or cvs, and all that left is the compiled binary of your project , then what will you do ??
Decompiling is the process of generating the source code out of running binaries (eg. file.c out of a.out) .  I have used the mocha decompiler for my previous java project when the customer just provided us the class files instead of java source code. We decompiled the classes and generated the source code out of it and studied the structure and logical flow of the project. I must say it was a tedious project.
Now , say for example, you have the source code and then you compiled it and later you will extract the source code from the binary , then how the two codes will differ, just let us see..
For this purpose, I am using the well known decompiler boomerang . It is available for both windows and linux. For linux we need to download  it from http://boomerang.sourceforge.net/. As it depends on a seperate libgc, download that too from the same site and copy it to the /lib directory. Then again it depends on the libexpat .. just create a link to the existing expat library in /lib and it won’t complain again :)

My C program to test decompilation using boomerang is
/**************************************************/
/* Program to check the characteristics of malloc */
/*                               */
/**************************************************/

#include <stdio.h>
#include <stdlib.h>
int *fun(void)
{
    int *a;
    a = (int *) malloc(sizeof(int));
    free(a);
    return a;
}

int main()
{
    int *j;
    j = fun();
    *j = 5;
    printf(“%d\n”, *j);
    return 0;
}

Then compile it as
cc test.c  , now we got the much awaited a.out

then run the boomerang on a.out. You will see something like
./boomerang a.out
Boomerang alpha 0.3 13/June/2006
setting up transformers…
loading…
Warning: dynamic symbol table hack used!
decoding entry point…
decoding anything undecoded…
finishing decode…
found 2 procs
decompiling…
decompiling entry point main
 considering main
  considering fun
  decompiling fun
 decompiling main
generating code…
output written to ./output/a
completed in 0 secs.

go to output/a then you will find another test.c
// address: 0×80483d9
int main(int argc, char **argv, char **envp) {
    int local7;                 // r24
        
    local7 = fun();
    *(int*)local7 = 5;
    printf(“%d\n”, 5);
    return 0;
}           
     
// address: 0×80483b4
fun() {
    int local5;                 // r24
   
    local5 = malloc(4);
    free(local5);
    return local5;
}  
   
Well, almost similar without header files and simple changes. But it is exactly what the previous source code meant to do :) ..
I will definitely call it a 90% success.

Posted by maxinbjohn in 04:57:41 | Permalink | No Comments »