Tikalon Header Blog Logo

Obfuscated Software

August 14, 2013

When I was in graduate school, one of my fellow students commented to a professor that he would be happy when he got his degree, since he wouldn't need to take any more tests after that. The professor replied that after his Ph.D., every working day as a scientist would be a test. You're always being judged by your peers; and, most importantly, by the funding agencies.

As a personal example, at the end of a staff meeting, our research vice president distributed copies of an unsolicited proposal which was routed to him for comment. This was our daily test. We scanned the proposal quickly, and one of the other physicists started to explain that what was proposed violated Heisenberg's uncertainty principle. Actually, he failed his test, since quantum mechanics did not apply. I outlined the problem using classical electrodynamics. As my reward for passing my daily test, I was the one who had to write the report!

When at school, your reward for passing a test is a good grade and a slight forward movement towards your degree. Outside of academia, your reward is usually your salary, occasionally the esteem of your colleagues, and sometimes a monetary award. Mathematicians can reap a huge reward by solving one of the Millennium Prize Problems, valued at a million dollars each, and there are large monetary awards in all major fields.

Often, competitions are devised whose only award is the recognition of your colleagues. One such competition that I've followed for more than two decades is the International Obfuscated C Code Contest.[1] The goal of this contest is the creation of working programs in the C programming language which are nearly impossible to decipher. I could insert a joke here about my own C programs, but I won't.

C is a good language for such a contest, since it's oblivious to whitespace, and it has several language constructs that lend themselves to obfuscation. I'll mention, also, an inherently obfuscating programming language, Whitespace, for which any program looks like a blank page. The language is coded using unique sequences of whitespace characters as its syntax.

Cover of the 1978 first edition of The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie

The C programming language is still popular more than thirty years after its creation by Brian Kernighan and Dennis Ritchie of Bell Labs.

Cover of the 1978 first edition of Kernighan and Ritchie's book, The C Programming Language.[2]

(Scan of author's copy.
A graphical reproduction of this cover by Jules Mazur is available on Wikimedia Commons)


I've attached one example of an obfuscated C program, dorssel.c, written by Frans van Dorsselaer of Bakker Industrial Automation, Leiden, The Netherlands. This short program, which compiles without error using the GNU C compiler (gcc), but with a few warnings, converts ASCII characters to Morse code, and Morse code to ASCII characters. However, it's hard to discern its purpose by looking at the source code. This program won the 1998 Obfuscated C award for "Obsolescent Feature."

Although such a program is obfuscated, its decoding is not impossible. After all, an experienced programmer wrote the program, so reverse-engineering by another experienced programmer should be possible. When we do obfuscation by an algorithm, however, things will get harder if the algorithm is unknown to the decoder. PHP is an interpreted language, so the only way its programs can be distributed is by source code. There are obfuscator programs available which are designed to automatically obfuscate your PHP programs to prevent their being modified; and, perhaps redistributed by others claiming the program as their own.

These obfuscator programs still produce code that's decipherable, albeit with considerable difficulty. A team of computer scientists from UCLA, IBM Research and the University of Texas at Austin has just developed an obfuscator approach for which reverse-engineering requires the solution of mathematical problems so complex that their solution would take hundreds of years.[3-4] Their approach is described in a 46-page paper scheduled for presentation at the 54th Annual IEEE Symposium on Foundations of Computer Science, October 27-29, 2013, at Berkeley, California.[4]

Mathematical jigsaw puzzle

Do you remember the order of operations?

To solve any of these equations, it's important to remember to multiply before adding. I always use parentheses in my code to make things more obvious.

(UCLA Engineering image.)


Their method is difficult to summarize, but the authors liken it to a mathematical jigsaw puzzle, as artistically pictured above.[3] Says coauthor Amit Sahai, who specializes in cryptography at UCLA's Henry Samueli School of Engineering and Applied Science,
"The real challenge and the great mystery in the field was: Can you actually take a piece of software and encrypt it but still have it be runnable, executable and fully functional... It's a question that a lot of companies have been interested in for a long time... You write your software in a nice, reasonable, human-understandable way and then feed that software to our system... It will output this mathematically transformed piece of software that would be equivalent in functionality, but when you look at it, you would have no idea what it's doing."[3]

Professor Amit Sahai, UCLA

Professor Amit Sahai of UCLA.

(UCLA Engineering photograph.)


The research team has applied the same principles of obfuscation in the creation of a process called functional encryption. Instead of sending an encrypted message, you send an encrypted function in its place. Functional encryption has the property that a single message could be transmitted to a group of people, and each person would receive a different decrypted message.[3-4]

This research was funded by the National Science Foundation and other agencies.[3-4]

References:

  1. International Obfuscated C Code Contest Web Site.
  2. Brian W. Kernighan and Dennis M. Ritchie , "The C Programming Language," Prentice Hall (Englewood Cliffs, New Jersey), First Edition, 1978, ISBN-10: 0-13-110163-3, paperback: 228 pages.
  3. Matthew Chin, "Computer scientists develop 'mathematical jigsaw puzzles' to encrypt software," UCLA Press Release, July 29, 2013.
  4. Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai and Brent Waters, "Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits," July 21, 2013, Paper prepared for the 54th Annual IEEE Symposium on Foundations of Computer Science, October 27-29, 2013, at Berkeley, California.

Permanent Link to this article

Linked Keywords: Graduate school; professor; Doctor of Philosophy; Ph.D.; scientist; peer review; judged by your peers; funding agency; research; vice president; research proposal; physicist; Heisenberg's uncertainty principle; quantum mechanics; classical electrodynamics; academia; salary; collegiality; colleague; award; Mathematician; Millennium Prize Problem; competition; decade; International Obfuscated C Code Contest; computer program; C programming language; whitespace character; language construct; obfuscation; Whitespace programming language; syntax; Brian Kernighan; Dennis Ritchie; Bell Labs; Jules Mazur; Wikimedia Commons; dorssel.c; Bakker Industrial Automation; Leiden, The Netherlands; GNU C compiler; gcc; ASCII character; Morse code; source code; programmer; reverse-engineering; algorithm; PHP; interpreted language; computer scientist; University of California, Los Angeles; UCLA; IBM Research; University of Texas at Austin; mathematics; mathematical problem; 54th Annual IEEE Symposium on Foundations of Computer Science; Berkeley, California; order of operations; equation; multiplication; multiply; addition; adding; parentheses; author; jigsaw puzzle; Amit Sahai; cryptography; Henry Samueli School of Engineering and Applied Science; functional encryption; encryption; encrypted; function; National Science Foundation; Brian W. Kernighan and Dennis M. Ritchie , "The C Programming Language," Prentice Hall (Englewood Cliffs, New Jersey), First Edition, 1978, ISBN-10: 0-13-110163-3, paperback: 228 pages.