functions - みる会図書館

422 Software Tamperproofing daddr = start + (((uint32)daddrAx)&nask) ; h = h + *daddr; h = ROTL(h) ; return h ; (The rotation makes the function immune t0 a particular attack 卩 24 ]. ) TO ensure predictable execution tlme, you must also consider cache effects. First' the hash function itself must be small enough tO fit inside the instruction cache Of the CPU and second, the region you're hashing must fit intO the data cache. The Skype VoIP client inserts hundreds of hash functions that check each other and check the actual code. We'll show you the details of their protection scheme later, in section 7.2.4 ト 4 引 , but for now, here is the family 0f functions they're usmg: uint32 hash7() { addr-t addr ; addr = (addr-t) ((uint32)addrA (uint32)addr) ; addr = (addr-t) ((uint32)addr + ① X688E5C ) ; ① x1C4C4 ; uint32 hash = ① X32 ① E83 int bound = hash + @xFFCC5AFD ; do { uint32 data = *((addr-t)((uint32)addr + ① xl の ) ; goto bl; asm volatile(" . byte ① X19 " ) ; b 1 : hash = hash ① data; addr ー = 1 ; bound- } while (bound! = の ; goto b2 ; asm volatile(" . byte ① X73 " ) ; b2 :

2. Surreptitious Software

676 Hardware for Protecting S0ftware The extend(i , add) function doesn t Just assign a new value tO a PCR[i] , but rather b). This way, each PCR[i] ex 〃ム it by computing PCR [i] SHA1(PCR[i] register becomes a combination Of all hashes ever assigned tO it. Furthermore' the construction preserves the order in which the measurements were added. 11.23 The TPM Let's turn our attention tO the TPM itself, since it's clearly at the heart Of the system. For right now, you can think 0f it as having the following data and functions: static class TPM { private static KeyPair EK; private static byte[] ロ PCR; private static byte ロ ownerSecret ; public static void POST() static void take()wnership(byte ロ password) static VOid atManufacture ( ) public static byte[] RND(int size) static KeyPair generateRSAKeyPair() static byte ロ signRSA(Object data, PrivateKey key) public static byte[] SHAI (byte[] b) public static Object[] quote(byte[] nonce) public static void extend (int i , byte[] b) public public public public Here are the three life-event routlnes: ready for a new round Of measurements. ln particular, it zeros out the PCR registers. commands tO it. Finally, at start-up time, the function clears internal data tO get initialize the TPM with a secret password. This prevents Others from issumg sensitlve owner takes possesslon Of the computer, he invokes the take()wnership function tO Once the TPM has been glven an EK, no one can ever give it another one•When the the TPM gets a unique identity, an RSA key pair called the e 〃ハに / た 0 (EK). functions atManufacture ( ) , takeOwnership ( ) , and POST(). At manufacturrng time' A TPM goes through three important life events, which we represent bY the three

3. Surreptitious Software

7 8 Basic blocks , 11 defined , 119 marking of, う 15 Bidirectional debugging , 1 うう Binaries encryption of, う 59 stripped, 6 う , 66 , 172 ー 174 Birthmarking, xvi, 5 , う 0 , 602 algorithms for, 610 credibility of, 612 described, 47 , 472 , 610 ーる 11 dynamic vs. static, 612 dynamic function call, 629 ー 6 う 0 example of, 4779 functions in, 612 indications for use of, 4576 Java bytecode, 625 25 k-gram API, 0—6 引 object-oriented , 626 ーる 29 whole program, 1 ー 644 Black hat code obfuscation, 26 ー 27 types of, 27 2 Blackbox, 6 virtual, 8 , 0 BIock add ress table (BAT) , 6 男 94 Block splitting , 255 Blu- ray discs , protection schemes Camouflage, 106 Call graph, 12 う一 126 Bus enc ryption , 697 BuiId-and-execute strategy,357—358 Brute-force attacks, 484 Broadcast monitoring , 471 software, 15g152 hardware, 149 , 150 Break points, 146 attacks against, 245 ー 246 Branch functions, 2 9 , 592 Boomerang , 71 splitting of, 268 ー 269 encoding of, 266 ー 268 Booleans Board-level protection, 708 ー 711 for, 4 Cheapskate problem, う 47 48 CDs , p rotection schemes for, 658 ーる 59 660 ー 661 CD-ROMs, protection schemes for, lndex CHECK function, 4 鮖 , 411 accuracy and precrsron 0f, 409 distributed, 4 う 4 Checker network, 414718 Checkpointing , 1 う 4 , 157 Checksumming , 412 Chenxification, 226 , 228 , 2 う 4 Classes , splitting and merging of, 279 ー 281 Classification ma rks , 472 Cleft sentence transformation, 477 Cloakware , xvlii Clone detection , 602 algorithm for, 6 国 AST-b ased , 6 引め metrics-based , 5 46 PDG-based , 6 % ーる 59 phases of, 6 の Clone detectors, 418 , 6 の Cloning, 49 0 Code checking , 4 鮖 Code obfuscation, xv, 5 of abstractions, 277 ー 297 aliases , 229 ー 2 引 . background for, 201 ー 202 black hat, 2 う 2 branch functions, 2 う 9 t0 complicate control flow, 225 ー 246 described, 14 data encoding, 258 ー 276 disadvantages of, 46 dynamic. 、ゞ Dynamic obfuscation example of, 15 ー 16 history of, 202 non-semantics-preserving, う 49 ーめ 4 opaque predicates, 24 & ー 251 practicality 0f007 ー引う semantics-preserving, 202 ー 217 and tamperproofing , 401 transformations in, 20 ー 25 of, 16 ー 20 Collusive attacks, 42 , 158 Common subg raph , defined , 615 Computer security, aspects of, 1 Confidentiality, 6 う Confusion, 1 の Containment , 4 う defined, 614 15 graph, 61 う Co 〃 / 4 ゞ function, 612 Content Scrambling System (CSS), 1 ーる Continuous replacement, 462764

4. Surreptitious Software

10.2 Definitions 611 1. Copy one or more sections of code from p into Q. 2. Compile Q, as necessary, into binary code or bytecode. ろ . Apply semantics-preserving transformations, such as obfuscation and optlmization, tO Q and distribute the resulting program. Birthmark detection, then, becomes the process of extracting properties of code that are lnvarrant tO C01 1 on semantlcs-preserving transformatlons: bmp = sig(p) 7 ① % 0 bm Q = sig(Q) Here, we ve extracted birthmarks from programs p and Q and determined that a large part of p has been included ⅲ Q. Some birthmark detection algorithms assume that 4 〃み of p's code has been lifted into Q. That is, Q is simply a version of p that has been sufficiently obfuscated to make it possible for the adversary to argue that Q is an independently developed program whose behavior is identical tO that of p. ln a 1 ore C01 月 1 月 scenar10, some central part Of P is lifted into Q, maybe no more than a few functions or a few modules. 10.2 Definitions The definition of birthmarking and attacks on birthmarks mirror those you saw for watermarking ⅲ Section 8. う 480. The major difference is that unlike watermarking, birthmarking has no み記 function. Also, all the functions ⅲ a birthmarking system are unkeyed. As a result, in a birthmarking system, the ex 〃 4 function extracts the birthmark み directly from a program 戸 : x / 凱戸 ) →み

5. Surreptitious Software

7.2 lntrospection 4 is based on the hash7 family of functions you saw ⅲ section 7.2.2-418. They are executed randomly,. The test on the hash function value is not a simple if (hash() ! = んの . . lnstead, the hash function computes the address of the next location tO be executed, which is then jumped to. You've seen the technique ofhash functions checking each other before, namely, in Algorithm TPCA. Here, however, the network is much simpler, with each real region (light gray) checked by a large number of checkers (dashed), each of which, ⅲ turn, is checked by one other checker (dark gray): Also, unlike AIgorithm TPCA, the Skype client doesn't attempt to repair itself when it has detected tampering. lnstead, it simply crashes, but does so in a clever way,. On detection, the client allocates a random memory page, randomizes the register values, and then jumps to the random page. This loses track of all the stack frames, which makes it hard for the attacker to trace back to the location where the detection took place. ln addition to the tamperproofing, the client code is also obfuscated. The target address Of function calls are computed at runtime, 1. e. , all function calls are done indirectly. Dummy code protected by opaque predicates is also inserted. The code is also obfuscated by occasionally raising a bogus exception only for the exception han- dler tO turn around, repair register values, and return back to the original location. ProbIem 7.4 lt is interesting to note that although Skype is a distributed appli- cation, it doesn't use any Of the distributed tamperproofing techniques you'll see later in Section 7. う 4 男 . The reason might be that much of the communication is client-to-client rather than client-to-server. Can you think of a way for clients to check each other ⅲ a peer-to-peer system without being able to collude? 7.2.4.1 Algorithm. REBD. ・ Attacking the Skype Client The ultimate goal of an at- tack on the Skype client is to be able to build your own binary, complete with your own RSA keys. To do that, you need to remove the encryption and tamperproofing.

6. Surreptitious Software

63 En cryption 585 fine/coarse granularity: Do you decrypt small pieces (say, a basic block) at a time (inefficient but ensures that only a small piece of code is ever ⅲ cleartext), or d0 you decrypt at the function or module level? key-as-data/key-as-code: Do you store the key as data values in the program, or do you derive the key from code? keep-in-the-clear/re-encrypt: Do you keep the decrypted code decrypted, or do you re-encrypt each piece Of code as it is no longer needed, making sure that as little COde as possible is ever ⅲ cleartext? Regardless of the choices you make, whether in the end it's worth the trouble is still highly dubious. We're going to show you two algorithms in this section. ()BFCKSP is a straight- forward medium-grained (function-level), on-demand, key-as-code, re-encrypung algorithm. Algorithm OBFAG IS an extension to OBFAG from crypt swap Section 6.2.2 物づ 66 that xors the code cells with a key stream as they are moved back and forth between the upper and lower memory regions. Later ⅲ the book (Section 7.2.4 ・、 4 引 ) you will see how the Skype VoIP client is protected using a bulk coarse, key-as-data, keep-in-the-clear encryption scheme. 63.1 0 卲 C P. ・ Code as Key Material The goal of code encryption is to keep as little code as possible in the clear at any point ln time during execution. At one extreme, you can decrypt the next lnstruction, execute it, re-encrypt it, decrypt the next instruction, execute, and so on. This way, only one instruction is ever ⅲ the clear, but as a result, the performance overhead will be huge. At the other extreme, you can decrypt the entire program once, prior to the start of execution, and leave it in cleartext. This will have little impact on execution time but will make it easy for the adversary to capture the decrypted code. Algorithm OBFCKSP 4 ] takes an intermediate approach and decrypts one function at a time. 、・ len the program starts up, it's all encrypted—all except for the main function, which is in the clear. Before you Jump to a function, you first decrypt it, and when the function returns to the caller, you re-encrypt it. However, if that was all that you did, all the functions on the current call chain (from main to the current function) would be ⅲ the clear. For this reason, the first thing a function must dO when it's called is to encrypt the function that called it, and before it returns, it must decrypt it again. You can think of this as walking around a house where every door is always kept locked—you unlock one door at a time, walk through, and re- lock it behind you. This way, Algorithm OBFCKSP makes sure that at any one point ⅲ time, tWO functions at most will be in the clear.

7. Surreptitious Software

742 lntrospection, 4 国 , 412--413 algorithms for, 414-418 attacks on, 4 い issues with , 444-44 う lntrusion detection , 2 lnvisible watermarks, 469 lrdetO , xvili lrreducible, defined, 2 7 lsomorphic , defined , 616 Java bytecode birthmarks, 62 う 25 Java code, disassembly of, 10 k-gram, defined, 616 k-gram API birthmarks, 6 引 k-gram hashes, 616 ーる 19 k-gram-based analysis, 616 algorithms for, 61 蜃る 25 Kruskal count, 174 LearnabIe functions, obfuscation of, 弭 ()—弭 1 Least Significant Bit (LSB) encoding , 474 Levenshtein distance and similarity, 6 リ 14 Library functions, vulnerability 7 う一 75 Licensing marks, 471 Linear sweep, 174 ー 178 Literal data, encoding of, 269 ー 272 Local analysis, 125 Local stealth , 223 Locate-alter-test cycle, 69 ー 70 Loops, identifying, 12 ) Map primitive, 101 ー 1 国 , 105 , 108 , 299 , 599 Maximal common subgraph, defined, 615 May-alias problems, リ 6 , 137 Mealy machine, 270 ー 272 Media watermarking, う 7 8 , 468 , 469 of functions, 205 ー 206 of classes, 279 ー 281 Merging, 298 Merge primitive, 9 い 100 , 298099 Memory watchpoints, 1 う 0 Memory watching, 76 ー 78 Memory splitting, 458 ー 4 ろ 9 Memory remanence, 710 embedding ⅲ , 494798 lndex Meta-data marks, 471 Metamo rphic virus , う 2 Met rics software complexity, 190 , 1 男ー 195 style, 190 , 191 ー 1 男 Metrics-based analysis , 4- ー 5 algorithms for, 645 ー 652 rosoft , XVii1 Military, use Of surreptitious software by, xix—XX1 Millionaire problem, 548 ーう 49 Mim ic functions , 106 Mimic primitive, 106 ー 108 , 298 Misdirection , 27 Mobile agent computing , 18 ー 19 Mocha, 209 Modular exponentiation, 691 MoveUp function, 280 , 282 Must-alias problems, リ 6 Mutual exclusion object, 5 Naturallanguage text , watermarking of, 475778 Network firewall, 2 Node classes , unstealthy, 562 ー 565 Node splitting, 2 う 7 protecting against, 5 う 8 ー 559 Nodes-and-arcs, defined, 196 Nonce, defined, 679 Northern TeIecom , xviil Null cipher, 6 0 0BFAGcrypt algo rithm , 392 94 deriving keystream , 94 96 example 0f096 ー 398 OBFAGsw 叩 algorithm 06 & 69078 auxiliary routines used in, 577 coding 0f076 example execution 0f074 function 0f069 ーう 74 overview 0f070 OBFAJV algorithm, 2 男ー 297 , 299 algorithm , 267 ー 268 OBFBDKNIRVcrypto algorithm , 265 ー 266 0BFBDKMRVnum algorithm , 265 0BFCEJO algorithm, 529 ー 5 ( ) 川 CF algorithm , 2 の一 204 OBFCFcopy algorithm , 206 ー 207 , 299

8. Surreptitious Software

43 Complicating Control 日 ow 245 However, if you 100k carefully at this graph you can see that it's in fact possible for the light gray node to be executed! TO reduce the risk 0f leaving out realizable paths, Algorithm REUDM folds ⅲ possible paths using static analysis. If you deploy a constant-propagation data flOW analysis along the paths that were actually executed' you discover that next can take on the value 1 , and hence, that the light gray block is, in fact, reachable. lt is still possible for the de-obfuscator to miss possible edges. The authors report that between 0.01 % and 1.7 % of real edges areleft out of the de-obfuscated control flow graph. lt is also unable t0 remove all 0f the fake edges. On average, in their test cases 21.4 % 0f the fake edges remained in the de-obfuscated graphs. 43.6.2 AIgorithm . R. EM 月 . ゞ B : Dynamic Attacks Against Branch Functions The problem with branch functions is that they're dynamically unstealthy. Madou et al. [ 2 う 4 ] report that for one benchmark program (gcc), the branch function is called 1 ううろ 6 times, and, furthermore, none of these calls return t0 the place from whence they came ! Algorithm REMASB に弭 ] makes use 0f this fact in a very simple attack. The first step is to lnstrument the program and execute it on representative inputs. Unless the attacker has input data that will exercise every path ⅲ the pro- gram, the dynamIC traces he collects won't be complete enough tO build a control flow graph. lnstead, the dynamc trace is used as input tO a recurslve traversal dis- assembler. Normally, such a disassembler would build a control flow graph given access only t0 the binary executable, but ⅲ this アみ″立 4 〃ノア〃 4 な approach' it can start from a set Of instructions that 4 立 be correctly disassembled, since they were actually executed on a real run. The result is a control flOW graph that' while possibly not 100 % correct, is more accurate than what could have been built from a static or dynamic approach alone. The second step is tO locate a function with the signature Of a branch function: lt should have many call sites and it shouldn't return tO the location from which it was c alled. The third step is to statically locate all the calls t0 the branch function and tO monitor them while executing the program. This can easily be done under a debugger. AII that is necessary is t0 set a breakpoint at the end 0f the branch function and then record the address to which it will branch. The final step is to replace the calls to the branch function with unconditional Jumps tO the actual target. This is trivial, unless the branch function has been de- signed tO have side effects tO prevent exactly such a replacement attack ! If that's the case, the attacker will have tO dO a more thorough analysis Of the semantics Of the branch function tO ensure that the attack is semantics-preservrng.

9. Surreptitious Software

442 Software Tamperproofing Algorithm 7.4 Overview of algorithm TPTCJ. P is the program t0 be protected, ー is the profiling input, is the desired threshold distance between corruption and failure sites, and T is a function-distance matrix. SELECT computes a set Of good corruption sites C for each global variable . PROTECT(), ムめ : 1. Execute p with ー as input and construct matrix T so that T[ / , g ] expresses the distance ()n terms Of elapsed time and number 0f function calls) between functions / and g. 2. Let R ← SELECT(), 7 ' , め be a set 0f possible variable/corruptlon sites. . Let R ′← be a set of random variable/corruption sites from R. 4. Modify P by adding a layer of indirection tO any non-pointer glObal variables ⅲ R ′ . う . Modify P by inserting tamper-detection code that corrupts the global variables in R ′ . SELECT( P, T, 阯 V ぐー set of P's global variables G ← P's ( a11 graph for リ e V do C ぐー set of functions of P F ←ー set of functions of P in which V iS used for each / e お dO for each ancestor g 0f / the ( a11 graph G d0 C ← C ー {g} for each function 0 e C do if T[c, 月 < then C ← C ー {c} return ( ら C) 1n lt's important that you keep any legal implications 0f your tamper-response mech- anism in mind. Deliberate destruction Of user data is likely tO invite legal reper- cussrons, particularly if the user can show that the tamper response was lssued erroneously ()l forgot my password, and after three tries the program destroyed my home directory! " ) But what about data that gets destroyed as an unintended conse- quence Of the tamper response? If the tamper response is the least bit probabilistic (which you would like it to be! ) , then how can you ensure that the eventual failure happens in a "safe" place? lt's easy tO imagine a scenario where the program crashes with a file open and the last write still pending, leaving user data in a corrupted and unrecoverable St ate. Algorithm 7.4 shows an overview 0fTpTCJ 卩 44 ]. The basic idea is for RESPOND to set a glObal pointer variable tO NULL, causing the program tO crash when the pointer is later de-referenced. If the program doesn't have enough pointer variables, TPTCJ creates new ones by adding a layer Of indirection tO non-pointer variables. The algorithm assumes that there are enough global variables t0 choose from; while this

10. Surreptitious Software

1.7 Software Similarity 47 This is where the concept Of み / ″る 4 ん comes in. The idea is tO extract signals" from Q and from M, and then look for M's signal within Q's signal rather than looking for M directly within Q: containment Consider this function that reads tWO strings from a file, converts them tO preservmg transformatlons. uses write system calls should therefore be reasonably robust agamst semantlcs- is tO issue the write system call. A birthmark extracted from the way the program replace with his own functions. For example, the only way tO wrlte tO a flle on Unix functions or system calls. some Of these functions are difficult for the adversary tO to compute the birthmark from the calls the program makes to standard library times and that we'll explore further in Ch 叩 ter 10 (S0ftware SimilaritY Analysis) is feature that is hard for an attacker tO alter. One idea that has been remvented several To be effective, a birthmarking algorithm must extract the mark from a language 1.7.4 A Birthmarking Example eqmvalent. grams tO bOth say push RI ; push R2 ; add when push R2 ; push RI ; add is semantically tially a birthmark. They also argued that it would be highly unlikely for two pro- and popped registers in the same order as in the original COde, which was essen- theft of their PC-AT ROM. They argued that the defendant's programmers pushed argue code theft. ln a court case ⅲ the early 1980S [ 128 ] , IBM sued a rival for ・ know of at least one case where birthmarking was successfully used tO C0n11 on COde transformations such as obfuscatlon and optimzatlon. program or module. The idea is tO select the birthmark SO that it's invariant under Here, / is a function that extracts the signal, which we call a birthmark, from a return vl*v2; fclose(fp); int V2 atoi(str); fs c an f ( fp , " % s " , s t r ) ; i n t v 1 atoi(str); fs c an f ( fp , " % s " , s t r ) ; FILE *fp fopen("myfile" char str[l@@] ; i n t x ( ) { lntegers, and returns their product: