5.5 Provably secure Obfuscation: Can lt Be Saved? 45 lt may certainly be possible t0 come up with an encrypuon scheme that SUPP0rts a particular set Of operauons, but it IS tlme-consummg tO create and test each new encryption system. Moreover, encryption schemes that support Other functions' such as division, have also been difficult t0 develop. However, if a provably secure obfuscation scheme were developed, these problems would vanish. TO see that' have a 100k at this program: long homomorphicEncryption(10ng x , long Y) { . 12 34 ; long privateKey = ① X1234. 10 れ g decryptX = decrypt(privateKey,x) ; long decryptY = decrypt (privateKey , Y) ; long result = f(decryptX, decryptY) ; return encrypt(privateKey,result) ; If we had provably secure obfuscation a program such as this one could be obfuscated and used as the homomorphic operator. The program uses an embedded private key tO decrypt b0th 0f its arguments, applies an arbitrary operation' on them, and then uses the embedded private key tO re-encrypt and return the result. The program itself is obfuscated, and the security 0f the obfuscation would guarantee the secrecy 0f the key. Another use of provably secure obfuscation could be t0 develop new types of public key encryption schemes. ln fact, with a provably secure obfuscaton you would be in a position tO turn any private key encryption scheme 1ntO a public key encryption scheme. Given the difficulty of developing good public key encryp- tion techniques, this would be hugely advantageous. TO develop a new public key encryption scheme, you would embed your private key in a program like this: long publicEncrypt (long x) { . 12 34 ; 10 れ g privateKey = ① X1234. return privateEncrypt (privateKey , x) ; You would then obfuscate this program and release it as your public key. TO encrypt a message tO send tO you, anyone could run the program with the message as argument. To decrypt the message, you would use your private key and decryption routine. The security of the obfuscation would ensure that your embedded private key would remam secure.
う .6 1)iscusslon 555 unimportant. lt helps us identifywhere the core difficulty in carrymg out obfuscation lies. Using this understanding, we were able tO explore tWO workarounds for prov- able obfuscation that are applicable t0 a smaller class ofprograms: secure multi-party computation, where security is gained by distributing the computation among more parties, and whitebox remote procedure calls. Whether or not obfuscation is possible appears t0 depend strongly on how you define obfuscation, the properties you want tO protect, and the class ofprograms you are considering. are many open questions ln each Of these workarounds. For example, are there interesting commonly used applications that cannot be executed using this design for obfuscated execution? TO avoid the tyranny ofthe impossibility of obfuscation, you need t0 prevent the program you are obfuscating from being able tO signal a single bit Of information tO a potential attacker. As a result' your encoded program on the client must be difficult t0 distinguish from any other encoded program in the class you are interested in. This is true for its execution time, memory usage, and any 0ther factor that could be used bY the program as a side channel. As you grow the class 0f programs you would like your obfuscator t0 obfuscate, it becomes increasingly difficult tO prevent a program from being able tO send such a signal.
53 Provably secure Obfuscation: lt's Possible (Sometimes) ! Algorithm 5.2 Overview 0f algorithm OBFNS. OBFUSCATEDATABASE()B ) : 1. Create a new obfuscated database DB ′ . 2. For each record consisting 0f ( ァ , む 4 んの in DB : (a) Generate t 从ー 0 random numbers 尸 1 and 2. 525 (b) create an obfuscated key by generating a keyed hash of ァ with key (c) create a value key by computing the xor of んに and the hash of た 0 with key . (d) store the obfuscated key and value in DB'. Phonebook name A1ice Bob Char1es David Einstein phone 555 ー 1 ①①① 5 5 5 ー 312 4 5 5 5 ー 1516 555 ー 9986 555 ー 7764 To 100k up a phone number for Alice, it is possible to search the database to find an entry W1th a name field Of "Alice" and tO return the corresponding phone field. Here's the same phone book database obfuscated with Algorithm OBFNS: Obfuscated name る 4 る ( 15144CharIes ) る 4 る ( 11114A1iCe ) る 4 ( 13623David ) る 4 る ( 12378Einstein ) 紡 ( 11114B 。 b ) Obfuscated Phonebook Obfuscated phone ん 4 紡 ( 16783CharIes ) ① 555 ー 1516 る 4 、訪 ( 87346A1iCe ) ① 555 ー 1 ①①① ゐ 4 ゐ ( 46395David ) ① 555 ー 9986 る 4 る ( 35264Einstein ) ① 555 ー 7764 る 4 ゞる ( 25234B0b ) ① 555 ー 3124 15144 11114 13623 12378 11114 16783 87346 46395 35264 25234
18 What ls 、 " 印あ、ゞ 0 ア尾 ? and the keys it uses. Dynamic attacks are also possible. For example, cryptographic algorithms have very specific execution patterns (think tight 100PS with lots of xors) and if they're not heavily obfuscated, they'd be easy to find using a dynamlc trace of the program. The keys themselves are a weak point. They're long strings of bits with a high degree of randomness, and as such, unusual beasts in most programs. So Axel could simply scan through the player code looking for, say, a う 12-bit long string that's more random than expected. Any code that uses this string is likely to be the decryptor. Once AxeI has found the location of the decryptor, he should have little problem finding where the decrypted media is generated and sent to the decoder. He can then simply add some code that writes the decrypted content to a 61C , and he's done. What welearn from this is that Doris needs to obfuscate her code SO that a simple pattern-match against it won't reveal the location of the decryptor or decoder, or the interfaces between them. She needs to tamperproof the code so that Axel can't lnsert new code, she needs to obfuscate not only the static code but also the dynamic behavior of the player, and she needs to obfuscate static data (the keys) ⅲ the code as well. And, still she has to assume that these defense measures are only temporary. Given enough time, Axel will bypass them all, and so she needs t0 have a plan for what t0 d0 when the system is broken. 1.4.13 Mobile Agent Computing ln our next scenario, Doris sends out a mobile shopping agent, which visits online stores ⅲ order t0 find the best deal on a particular CD. The agent traverses the 、 b and asks every store it encounters if they have the CD and how much it costs, records the best price so far, and eventually, returns to Doris with the site where she can get the best deal. Of course, if evil Axel runs a store there's no reason why he wouldn't cheat. First of all he can Just erase the information that the agent has collected SO far and substitute his own price: MobiIe Shopping Agent Best price: $ 12.95 Best vendor: CDAxel.com CDAxel.com This strategy will only help him if the agent returns directly to Doris when it's done with Axel's site. Much better (for AxeI) would be to manipulate the code so that regardless 0f which stores it visits after his, it will still record his (higher) price as the best one.
6. う Encrypuon 587 Algorithm 6.4 Overview of algorithm OBFCKSP. P is the program to be obfuscated•, G its call graph. OBFUSCATE( P, G): 1. For every function call / → g for which there is no other call る→ g , decrypt g before jumping to it and re-encrypt g after returning from the call: ノ“りⅣ g g hash(f) 走 0 〃びⅣ g g hash(f) 走 0 Modify P by encrypting g with the hash of the cleartext of / as the key. 2. For every set of function calls / 1 →ん→ g, ゐ→ん→ g, ・・・ insert code to decrypt/encrypt g in all / :s using a combination of the hashes of the e りⅣ g g hash(hl) ① hash(h2) ① . 〃 6 りⅣん g hash()l ) 走 0 びⅣん切 g hash(ft) た 0 びⅣ g g hash()l ) ① hash(h2) . . as key: encrypted ん , ん , Section 7.2.2 、 418 to P. 4. Add the functions in Listing 6.7 州 8 and a hash function from c 戮Ⅳ g's / な尸 / 〃 g hash(g) ん 0 び戮Ⅳ g's 佐尸 g hash(g) た 0 re-encrypt it when returns, using the hash of the cleartext of g as the key: う . For every function g , insert code to encrypt g 's caller on entry and to Modify P by encrypting g with the hash of the ciphertexts of ん訪 2 , ... as key.
引 4 Obfuscation Theory These restrictions are what we have to live with for the benefit of having provably secure obfuscatlon. 53.1 Algorithm 0 卲 . ・ Obfuscating with point Functions Go 引 Asset: Private data Attacker Limits: None lnput Programs: Point functions An interestlng asset you may wish to hide in a program is a binary check. These are functions that underly access control systems that use password protection. For example, to 10g into a Unix system, a login program must first check that the password a user enters matches the password stored ⅲ the system: boolean isVa1idPassword(String password) { if (password. equals("yellowblue")) then return true ; else return false ; Of course, practlce you never use such a program because the password IS embedded ⅲ the clear and anyone with access to the program could reverse engmeer it. TO prevent access tO the login program from revealing the password of users, passwords cannot be stored in cleartext—they must be obfuscated. Algorithm OBFLBS [ 2 う引 is the generalization of the commonly used practice 0f using hashing to securely obfuscate password checking. lnstead of embedding a user s password, only the hash of the password is stored. lt is then compared with the hash of an entered password: boolean isVa1idPassword-PF(String password) { if (shal(password) . equals ( " 642 ① 31ad5e946766bC9a25f35dC7C2 " ) ) then return true ; else return false ;
50 Obfuscation Theory Listing 5.1 He110 world program written in perl. The code can be de-obfuscated by replacing the magrc quotes ( ' ) ⅲ the shaded section with a print statement. Does this result make the search for good obfuscatlng transformations futile? We don't think so. As you will see, the result 0f Barak et al. is a very strong and important one. lt shows that programs exist that cannot be obfuscated. But it does not necessarily prevent any specific program from being obfuscated ! For example'
ろ .4 Pragm atic Analysis 195 ProbIem 5.12 No one knows whether the software complexity metrics defined ⅲ the literature actually provide good measurements Of artificially obfuscated pro- grams. Obfuscate some sample programs with transformations from Chapter 4 , measure their complexity before and after obfuscauon using relevant metriCS' and measure the effort needed by human subjects tO understand the programs with and without obfuscation. ls there a correlation? lt's conceivable that complexity metrics could be used bY the bad guys t00 ! For example, for performance reasons it's common tO only apply obfuscating transfor- mations tO securlty-critical parts Of a program. If complexity metrics measurements are, in fact, significantly different on original and obfuscated then an attacker could use them tO zero ⅲ on the suspicious parts Of a program that he should examine first. 43 Software Visualization Large programs can consist Of millions Of lines Of tens Of thousands Of func- tIOns, and thousands Of modules and classes. For an adversary whO wants tO galn a complete understanding Of your program, the sheer size Of it can be a serious impediment. If your program isn't very big ⅲ itself' many 0f the obfuscation al- gorithms ⅲ Chapter 4 (C0de Obfuscation) are designed t0 automatically make it bigger by duplicating code or inserting bogus code. TO aid inprogram 0 戸尾ゐに〃わ〃 Of large programs, many techniques have been invented for ゞ 0 アル 4 尾な 44 / た 4 / あ〃 . The techniques can be bOth static and dynamic. the data structures you saw in section ろ . し、 118 (control-flow graphs, data-dependence graphS' call graphS' inheri- tance graphs) grow with the size Of the program' and a reverse engineer can benefit from being able t0 explore them visually and interactively. The static structures Of a program can be large, but they pale ⅲ comparison tO the truly enormous amounts Of data that can be collected from a runnmg program. The address trace, the dynamic call graph, and the heap graph are three structures that can be useful for a reverse engineer tO visualize' and all three will be large for long-runmng programs. The size itself can be a challenge for a visualization system' but the fact that the structures are continuously changing makes the problem that much harder. A visualization system consists Of five components. The first component C01- lects information from the static code or from the executing program. Data from an execution such as function thread switches' and system calls (commonly known as 加尾立ル g に霍〃な ) are usually collected by instrumenting the code. The
626 Software Similarity Analysis C01 Ⅱ 1 on Java libraries With itS 0 从 , n verSlOns. can be a obfuscation not only because it obscures the interface between the application code and the library' but because it makes the library code amenable tO obfuscation as well. ・ What's even more interesting is that some standard library functions 〃〃 0 / be replaced. If you want tO write something tO a file ()r a socket' a and so on) on Unix, you る 4 ″ e / 0 use the write system call or one Of the library functlons that call write. Similarly, ⅲ Java, if you want tO open a windOW' you had better create an mstance Of j ava. awt. Frame Of its subclasses. There are several birthmarking algorithms that are based on this observation. The idea is that the way a program uses the standard libraries or system calls collectively call them APls from now (n) is not only unique t0 that program but 引 so difficult for an adversary t0 forge. As we already mentioned , not all APls are created equal. TO be really useful for birthmarking PUrposes, the use of an API needs to be difficult t0 obfuscate. That is, it needs tO be ・ 4 知な (). e. , a call can't be obfuscated bY splitting it up in pieces) , ・立 4 〃 (). e. , calls can't easily be added or removed' since this would affect the state Of the system) , ・〃 0 〃如 g 訪な (). e. , its use can't be replaced by the use of another API)' and ・ common in real COde. For example, on Unix, the read system call isn't a good candidate for an API birthmark, since it isn't atomic; a single call can be obfuscated by splitting it into tWO or more calls. The Unix gettimeofday() and getpid() system calls also aren't good candidates, since they are state-less; an adversary can disrupt the birthmark by sprinkling calls t0 these functions willy-nilly all over the program. The use of the standard malloc library can easily be forged bY replacing it with another one. Finally, the ioctl ( / -0 ぞ 0 〃〃 0 の system call is atom1C' non-forgeable' and state-full but basing a birthmarking algorithm on the use 0f this call would not be very useful SInce lt occurs sparingly or not at all in most application programs. 10.4.1 、 TN. Ⅷ Object-Oriented Birthmarks Tamada and co-authors have presented a collection Of algorithms for collecting birthmarks fromJava API types and method calls. ssTNMMsMc 4 引 computes the birthmark from the sequence 0f method calls within a class' ssTNMM1s computes the birthmark Of a class from the inheritance path from the root class tO the class'
4 ) Data Encodings 26 み 4.5.1.1 AIgorithm OBFBDKMRVnum: ・ Number-Theoretic Tricks Many integer ob- fuscations are based on number-theoretic tricks. ln this transformatlon, an integer ツ is represented as N * カ + ァ , where N is the product 0f two close prlmes' and 々 is a random value: typedef int T4 ; #define N4 ( 53 ☆ 59 ) T4 E4(int e , int p) {return p*N4 + e; } int D4()4 e) {return e%N4 ; } T4 ADD4()4 a, T4 b) {return a + b; } T4 MUL4 ()4 a , T4 b) {return a*b ; } BOOL LT4()4 a , T4 b) {return D4(a)<D4(b) ; } N4 must be larger than any integer that you need tO represent. De-obfuscat10n IS simply removing N* 戸 by reducing modulo N4. What's nice about this representation is that addition and multiplication can both be performed in obfuscated space. Before comparlsons can be performed, however, the argument values need tO first be de-obfuscated. NOtice that this is a parameterized obfuscation; you can create a whOle family of representation by choosing different values for ル lt's a good idea to hide 々 bY computing lt as an op aque value at runtime. If two differently obfuscated integers need t0 be operated on, then one needs t0 be first de-obfuscated and then re-obfuscated tO the correct representation. Here, x is obfuscated using type T3 and v using T4, and as they're multiplied and added together, their values have tO be converted from one representation tO the Other and back: int x = 7 ; int V = 6 ; V = X ☆ v; X = V 十 x; printf("%i\n" , x) ; T 3 x = E3 ( 7 ) ; T4 v = E4 ( 6 , 3 ) ; v = MUL4(E4(D3(x),5),v); x = 2D3 ( E3 ( D4 ( v ) ) , E3 ( 8 ) ) ; printf("%i\n" ,D3(x) ) ; ldeally, you have a transformation that directly takes you from one representation tO the Other without having tO go through cleartext. 4.5.1.2 AIgorithm OBFBDKMRV„pto: Encrypting lntegers lt is natural to think of obfuscating variables by encrypting them using one 0f the many standard