functions - みる会図書館

う 3 Provably secure Obfuscation: lt's Possible (Sometimes) ! 引 5 password functions are in the class Of functions called point-functions¯they are false everywhere except at a specific point. ln the case Of password functions' they are false everywhere except where the password is correct. The password checking function is now obfuscated, because computing the pre-image 0f a hashing function is a difficult cryptographic problem. Whereas ln the previous chapter we made use Of the difficulty Of program analysis tO create obfuscated programs, we now make use 0f cryptographic problems. If users choose passwords from a large set and an attacker given isValidPassword-PF is able tO deduce the user's password, then he must be able to invert SHA- 1. ln other wordS' the complexity 0f cracking this obfuscation is at least as hard as cracking SHA- 1. Hashing functions have been widely studied in cryptogr 叩 hY' and building an obfuscation that we can shOW tO be at least as hard tO break as hash functions them- selves is significant progress. Nevertheless, recently Wang け 6 引 showed weaknesses ⅲ the MD5 hashing function that may allow an attacker t0 eventually deduce the secret. rhat kind of function would you need instead of SHA- 1 if you wanted an obfuscation that you could prove was uncrackable? You would need t0 be able t0 prove that the function is one way. ln Other words, given the output Of the function' you should not be able to predict the input. UnfortunatelY, no known function has been proven t0 have this property. If you had a universally accessible function and a good source Of randomness, you COtlld a class RandomHash { HashtabIe hashes = new Hashtab1e() ; lnteger randomHash(String value) { if (hashes. contains(value)) return (String)hashes. get(value) ; lnteger newhash = new lnteger (Math. random() ) ; hashes . put (value , newhash) ; return newhash ; ln the randomHash function, the first time the hash Of an Object is requested' a random number from a large set is returned. The return value is also stored. Thereaften whenever that Object hash is requested, the original value is returned.

2. Surreptitious Software

316 Obfuscation Theory Many banks use a similar system when creating debit cards. BIank cards with preprinted lD numbers are sent to different branches. 、 en a customer requests a card, one is selected arbitrarily. However, once it is selected, the bank stores the relationship between the customer s account and the card ID number. ln this way, if the card is lost or stolen, a malicious attacker who finds the card will not be able t0 identify the customer simply from the ID number on the card. lt's lmportant tO note that unless the mapping of a customer s account to a card ID number is kept secret by the bank, the security of the scheme is compromised. Similarly, if an attacker were able to access memory and view the entire hashes table ⅲ RandomHash ()s opposed to probing for individual entries), the security of the obfuscation would be lost. This is especially unfortunate because every program that wishes tO use the random hashing functlon needs access to the same instance of the program. You can visualize such programs as existlng on a remote computer in such a way that they can only be probed for output by one particular input at a time. Mathematically, such programs are called 〃ノ 0 0 0 な and we will revisit them ⅲ the next sectlon. You can compose finitely many point functions to create 4 〃 z ・ - oz 〃 / functlons that have the property of being false everywhere except at a given set of points. The scheme for obfuscating point functions can be trivially extended to apply to multi- point functions. Here, the original function responds to a fixed set of passwords that are revealed by examimng the program: boolean isVaIidPassword(String password) { if (password. equals("yellowblue")) then return true ; else if (password. equals("redgreen")) then return true ; else if (password. equals("turquoise")) then return true ; else return false; And here we've obfuscated the function and the passwords are hidden: boolean isVaIidPassword-PF(String password) { if(shal(password) .equa1s("ce5a522e817efcac33decd1b667e28c277b5b5f@")) then return true ;

3. Surreptitious Software

206 Code Obfuscation lnlining is essentially a merge of a function caller and its callee. Similarly, out- lining is a split Of a function intO t 从 , 0 pieces. You can also merge t 从 , 0 unrelated functions. For pure functions (those without side effects) , this is trivial: Just merge the parameter lists, merge the function bodies, and replace any call to either of the functions with a call tO the merged function. When a function has side effects, you have tO be more careful. Here's an example where function f has side effects but function g IS pure: float f00 [ 1 ①① ] ; void f(int a,float b) { foo[a] = b; float g(float c , char d) return c*(float)d; f ( 42 , 42 . の ; int main() float v = g ( 6. ①ド a り ; float float foo[a] = bc; if (which==l) char d , int which) { fg(int a, float bc , f00 [ 1 ①① ] ; float v = fg ( 99 , 6. ①ド記 , 2 ) ; fg ( 42 , 42 . ①ドド , 1 ) ; int main() { return bc*(float)d; The merged function fg takes an extra argument which, to ensure that the code that causes side effects is executed only when the function is called as an f-function, not as a g-function.We alSO had to add extra bogus arguments to fg to fill out the parameter list. 4.1.1.4 AIgorithm OBFCFcopy: Copying Code A common obfuscation technique is tO make the program larger by cloning pieces of it. Size itself is an impediment tO reverse englneerlng, and if you can make the copied code 100k different from the original code, you may force the attacker to carefully examine all pairs of code blocks in the program to figure out which ones share the same functionality. Here's an example where we've cloned a function f into a bogus function fl. ・ We've further obfuscated fl by modifying the array index computation and adding some bogus COde that doesn't affect program semantics:

4. Surreptitious Software

180 Program An alysis Algorithm .2 Overview of Algorithm REHM. ( 朝〃廝め FINDCFGs(): gaps ぐー {undecoded byte ranges} repeat for every range [start... end] e gaps do while start く end dO cfg 一 CFG built from instructions starting at start if cfg is well ー fo 肥 d and has at least one CTI then functions ぐー functions CJ {start} findFunctions (start , end) continue with the next range else start 十十 gaps ← - {undecoded byte ranges} until no more changes to gaps (functions that don't contaln any function calls) might have a simplified prologue, for example. After FINDPROLOGUES can find no more functions, FINDCFGs sees if it's possible to build a reasonable control flow graph from the remaming undecoded bytes. Algorithm REHM [ わ 4 ] defines "reasonable" as, "There are no jumps into the middle of another instructlon, and the resulting function contains at least two control transfer instructlons. The second column ⅲ Figure う . 分、 177 shows the example program disassembled with Algorithm REHM. The first phase of the algorithm proceeds exactly the same as for a normal recursive disassembler. Any remainlng gaps are then decoded using FINDPROLOGUES and FINDCFGs. Notice how function f43 is only called indirectly, and function f48 isn't called at all, but the disassembler still finds them by searching for their prologue lnstructions. The disassembler next starts at location うう , realizes that 42 isn't a valid opcode, moves to location 54 , from where it's able to build a valid control flow graph. Notice that function f54 doesn t start with a prologue instruction. Bytes う 9 2 consist of a branch into the middle of another branch which findCFG discards as invalid. ln their tests, the authors found that the enhanced recursive disassembler re- covered 9 う .6 % of all functions over a set of Windows and Linux programs. 33.2 Decompilation Conceptually, a decompiler runs after the executable file has been disassembled taking as input the sequence of control flow gr 叩 hs that the disassembler discov- ered. The decompiler analyzes each control flow graph and turns basic blocks into a

5. Surreptitious Software

456 Software Tamperproofing Algorithm 7.5 Overview of algorithm TPZG. / is a function selected for splitting' and is a 10Ca1 variable in /. PROTECT(), の : 1. Compute a forward slice 0f / , starting with the statements that define 眦 2. Determine which variables should be completely hidden (). e. , should reside only on the server) and which should be partially hidden (). e. , should reside both on the client and the server). . Examine each statement ⅲ the slice and split it between the client function Of and the server function Hft. 4. If x is a partially hidden variable, then ・ translate x ←ーゞ tO x ← - Hft (x) where Hft (x) updates x on the server. ・ translate tO x ←明 ( ) ; / ん← where Hft (x) gets the current value Of x from the server. problem is な〃 . Networked applications are carefully designed t0 tolerate high latency—functions where high latency is unacceptable are kept on the client side' and the remaming functions can be kept on the server. If you move additional functions t0 the server side, latency may become intolerable. The final problem is み 4 〃ノル / ノ / ん If the client keeps large data structures and some Of the operations on these structures are moved tO the server, there may not be enough bandwidth tO move the data back and forth. Algorithm TPZG bypasses the network latency and bandwidth problems by only considering scalar data (functions on arrays and linked structures are kept on the client) and restricting the client and server tO bOth run on the same 10Ca1 area network. Algorithm 7. う sketches how t0 split a function / into one part' ()f' which runs on the client, and several parts, Hf which run on the server and which the client accesses through procedure calls.

6. Surreptitious Software

205 4.1 Semantics-Preserving Obfuscating Transformations If your obfuscation t001 now decides tO reorder a and b, the code will mysteriously cease tO function. 4.1.13 Algorithm OBFC 島 Splitting and Merging Functions As a program- mer, you use 4 ろ立 ZO 〃 tO manage the inherent complexity Of larger programs. You break long pieces Of code intO smaller functions, you collect related functions IntO modules or classes, you collect related classes 1ntO packages, and SO on. ln section 4.6 277 , we'll shOW you var10t1S ways in which you can obfuscate a program by breaking these abstractions. Fu 〃は z わ〃 inlining IS a common code optimizatlon technique that replaces a call tO a function with its code. This can also be an effective obfuscation technique, S1nce it breaks the abstraction boundary that the programmer created. The opposite Of inlining is / 4 〃あ〃 0 ″ / / ツ / 〃 g , where a piece Of code is extracted intO its 0 嶬 , n function and replaced with a call tO this function. Outlining is also an effective obfuscation technique, S1nce it inserts a bogus abstraction where there previouslywas none. an example where a Plece Ofthe modular exponentiation routine ()n light gray) is extracted intO its own function f and replaced with a call ( ⅲ dark gray): ・ 1 ・ 1 ・ 1 - 土 1 ・ 1 ・ 1 0 tn 0 ・ 1 ・ 1 ・ 1 ・ 1 ・ 1 , int x ロ , int modexp(int y ,int n) { i n t W int R, L ; int k i n t S w h i 1 e ( k く w ) R *R % n ; = ☆ 十十 L ; r e t u r n L ; r e t u r n

7. Surreptitious Software

80 Dynamic ObfuscatIon Algorithm 63 Overview of Algorithm OBFMAMDSB. P is the program to be are the functions in P , G is its call graph, and D is pro- obfuscated, 石 , ゐ , filing data collected over rep resent ative runs. OBFUSCATE( P, G, D): 1. Create a set each element is a set Of the functions that should belong to the same cluster: C ← CLUSTER( P, G, D). 2. For each cluster in C do the following : (a) Create a template containmg the intersection Of the code-bytes of the functions in た . (b) For each function / ⅲ create an edit script の・ such that applying t0 the code-bytes 0f creates the code-bytes of ん引 Return a new program consisting Of the main program from P , the templates , the edit scripts , and a code edit routlne patch( , の・ ). CLUSTER( P, G, D): 5. Return C. merges are possible. has been exceeded, or no more reached, the maxlmum overhead level of obfuscation has been 4. Repeat from 2 until the requlred う . C ← C ーに , } U }. graph. path between them ⅲ the call same time: i. e. , there must be no functions that can be active at the edits. ら IJ 0 / must not contain tWO mlnrmize the number Of function t0 merge. Merging them will the tWO cheapest clusters の and 0 2. Use the profiling data D to choose cluster,i. e. ,C ← { { ん , { ゐ } , .. ・ 1. Put each function in its 0 从 rn it would be unwlse tO put and ゐ in the same cluster, since much time would be spent editing the template as control goes back and forth between the tWO functions. profiling data collected from representative runs can help you make the best cluster asslgnment. NO tWO functions on the same call chain can be ⅲ the same cluster, since that would reqtllre them tO bOth reside in cleartext in the template at the same time, which Of course possible. Have a 100k at Listing 6 う、う 81 for a simple example. The unobfuscated program looks like this : int val void fl(int* v) { ☆ v = 99 ; } void f2(int* v) { ☆ v = 42 ; } int main (int argc , char *argv[] ) { fl(&val) ; f2 (&val) ;

8. Surreptitious Software

48 What ls 、 " 印 / ″あ、ゞ 0 アルを SeveraI birthmark-extracting functions are possible. You could, for example, make the birthmark be the sequence of calls to standard library functions: く fopen, fscanf, atoi, fscanf, atoi, fclose 〉 Or you could ignore the actual sequence, since some calls are independent and could easily be reordered. The resulting birthmark is now a set of the calls the function makes.• み 2 ( x ) = {atoi, fclose, fopen, fscanf} Or you could take intO account the number of times the function calls each library function. The birthmark becomes a set Of system calls and their frequency: = {atoi ト 2 , fclose トチ 1 , fopen 1 , fscanf ト→ 2 } An attacker would get a copy ofx, include it in his own program, p, and perform a variety Of transformations tO confuse our birthmark extractor. Here, he's renamed the function, added calls to gettimeofday and getpagesize (functions he knows have no dangerous side effects), reordered the calls t0 fclose and atoi, and added further bogus calls tO fopen and fclose: v 0 i d y ( ) { . int f() { FILE *fp fopen("myfile" c h ar s t r [ 1 ①① ] ; S t r u c t t i m e v a 1 t v ・ gettimeofday(&tv , NULL) ; fscanf(fp,"%s",str); a t 0 i ( S t r ) ; int vl fscanf(fp,"%s" , S t r ) ; fclose(fp); int v2 atoi(str); r e t u rn v 1 ☆ v 2 ・は 09S fp ) ; fp =[fppeå("myfile" int p = き土 04 豆 s ま名 ( ) ☆立的 , 03 を ( ) ; v 0 i d z ( ) { .

9. Surreptitious Software

246 Code Obfuscation Problem 4.10 lt seems like it will be difficult to build stealthy branch functions; their dynamic behavior is Just t00 distinct from ordinary functions. lnstead, to en- hance stealth, would it be possible to make every ordinary function behave like a branch function? Failing that, could you introduce enough complicated side effects in the branch function to prevent automatic removal? How about having multi- ple branch functions, the correct behavior of each one depending on the correct behavior of the others? ln Section 9.4 592 in Ch 叩 ter 9 (Dynamic Watermarking) , you will see a dynamc watermarking algorithm wMCCDKHLSbf, where the watermark is embedded by adding bogus entries into the branch function. Algorithm wMCCDKHLSbf attempts to overcome the REMASB attack by making the program s control flow depend on the branch function being left in place. Specifically, the branch function is extended tO compute an address that is later used as the target of a jump. This means that any attack that blindly removes the branch function will cause the program to break. ProbIem 4.11 Can you think of an antidote against REMASB-styIe dynamic attacks that makes use Of the fact that the attack is nonconservative? Specifically, can you protect your program ln such a way that there are regions that are very rarely executed (and hence unlikely to show up on a dynamic trace by an adversary who has an incomplete input data set) but that are highly likely to show up に厩 ~ If SO, can you insert tamperproofing code (such as the code we'll show you ⅲ Chapter 7 , Software Tamperproofing) into these regions that will trigger if the branch function is tampered with? 4.4 Opaque Predicates When we first started looking for opaque predicates, we went to the library, found a few books on number theory, and looked in the 戸 ro みん sections for statements such as "Show that Vx, ッ e Z : 戸 (), ア ) , " where 戸 is a predicate over the integers. Assummg that the author wasn't playing a trick on his readers, trymg to make them prove a false statement, we'd found another opaque predicate ! And we'd found a predicate, one would hope, that would be moderately hard to break, at least for number-theory graduate students. l)uring this search we found some cute lnequalities (such as Vx, ァ e Z : x2 ーう 4 舅 / 1 ) and many statements on divisibility (VxeZ:21x2 + x). There are, however, serlous problems with these number-theoretic predicates. First, you have to assume that if your obfuscator keeps a table of such predicates,

10. Surreptitious Software

8.4 Watermarking by Permutation 489 Algorithm 8.4 Overview of algorithm WMDM. Embedding and recognition can be repeated for some or all functions in the program. Long watermarks can be split over several functions. If the recognizer is non-blind, it needs the original, unwatermarked program as input. EMBED(), Ⅵ : 1. Select a function / from P and build its CFG G. 2. Canonicalize G by (a) optimizing away any unnecessary branches and (b) making any identical basic blocks different (insert redundant instructions, renumber regi Sters , reorder independent instructlons, etc. ). う . Let perm=int2perm(W). 4. If length(pem) is greater than the number of basic blocks abort. う . Linearize G into お = くあ , あ , 6. Reorder B according to perm, inserting branches where the fall-through case has changed. RECOGNIZE( P [ , Porig] ) : and お ' and output pem2int(perm). identifying identical blocks in お 4. Extract the permutation perm by embedding. using the same algorithm as during ろ . Linearize G' into お ' = く名 , have the same basic blocks. and (after canonicalization) G ′ with CFG G' ⅲ Porig such that G Non-blind: Find a function / ′ unnecessary branches. verslon Of G, optinuzrng away any BIind: Let G' be a canonical G, prior to watermark embedding: 2. Find a CFG G' corresponding to build its CFG G. .. 〉 and linearization お = くあ , あ , 1. Select a function / from P with There are two wrinkles with this method. First, the basic block reordering will sometimes cause fall-through cases to change, and this means that extra branches may need tO be inserted tO malntain the correct semantics.We've marked these branches in light gray in the example in Figure 83 -488. A second problem is that the same function may contain several basic blocks that are identical. lfyou can't tell one block from another, the recognizer can no longer recover the permutation ! TO prevent this situation, the embedder can either simply ignore functions with identical basic blocks, or it can insert bogus code (such as nops) until all blocks are unique.