7.2 lntrospection 7 an instruction fetch. The OS then 100kS 叩 the virtual address in the page tables and either updates the instruction TLB or the data TLB with the new virtual-to-physical mappmg ・ To implement the attack against a self-hashing algorithm, you need to do four things : 1. Copy P to Portg 2. Modify P however you like. ろ . Arrange the physical memory so that frame / comes from the hacked p and frame / + 1 is the corresponding original frame from 4. Modify the kernel so that if a page table 100k 叩 yields a →戸 virtual-to-physical address mapping, I-TLB is updated with →戸 and D-TLB with v →カ + 1. Here's an example to illustrate this: Virtual Address 03 05 02 00 PhysicaI frames 1 2 TLB mss Page TabIes I—TLB if (false) abort ( ) ; if 。 ( 0 ス pir 総め lnstruction fetch 4 D—TLB 5 Data fetch Here, the attacker has modified the program to bypass a license-expired check. The original program pages (dark gray) are interleaved with the modified program pages (light gray) in physical memory. Page う and page 4 are actually the same page, except that in page the license check has been hacked. When the program tries to read lts own code in order tO に x にび″ it, the processor throws an 姦 7- 'L お - な exception, and the OS 100kS up the correct m 叩 ping in the page tables and updates the I-TLB tO refer tO the modified page. 、 en the program tries to read its own code in order to る 4 紡 it, the processor throws a D-TLB-miss exception, and the OS updates the D-TLB to refer to the original, unmodified page. The result is that during hashing the original dark gray pages will be read and the tamper-response code won't be triggered, and during execution the light gray code that the attacker has modified tO his heart's content will be run.
292 Code Obfuscation Java virtual machine expects tO have a particular name and signature. Furthermore, any obfuscated Object (such as a string or an integer that has been encoded using any Of the techniques ⅲ Section 4. う第 258 ) needs to be converted to the original Java type before being passed to a library method. Any such remnants of the original program can provide valuable clues tO an attacker. The authors report that the code generated by their obfuscator was about 1() times larger and 4 tO 20 times slower than the original. What's even more depressing is that reversing the transformatlons was easy; in fact, de-obfuscation was 4 立 e お than obfuscation ! We let the authors describe the design of their de-obfuscator themselves [ 1 の , pp. 2 う一 24 ] : We developed a "de-obfuscator" (actually more of an analysis assistant) for our obfuscation t001. lt worked by searching for patterns in the input pro- gram and runmng selected parts of the program. lt was largely successful in that it was able to determme method entry points, the structure of the class hierarchy, which methods were constructors, etc. The de-obfuscator runs the く clinit> (static initialization, run when a Java class is loaded) of the program to retrieve the virtual tables and jump tables, The control flow leaving each block was determined by pattern match- ing on the DAG representation of the basic block. Since our obfuscator produces basic blocks that use data-driven Jumps, there is a calculation in each obfuscated basic block that returns the block to jump to. Our obfus- cator output [sic] only a fixed number of formats for this calculation, so the de-obfuscator can match against those formats. After simplifying the control flow leaving each basic block, the de- obfuscator used the virtual tables (read from running the initializer) to deternune which basic blocks were method entry points. Since no Java method ( ⅲ the original program) can jump to code ⅲ other methods, this allowed complete determination of method composition. The virtual tables alSO associated methods with classes. certain facts about the class hierarchy could also be determined from the virtual tables, such as superclass and lnterfaces. ln conclusion, the authors note that removing high-level information from a Java program and obfuscating data provides no extra security, and it comes at a high cost Of implement ation :
284 Code Obfuscation dynamIC memory allocation and deallocation, and so (n) into the low-level opera- tions supported by the hardware and operating system. This is, essentially, what the SPMA t001 does while staying within the Java bytecode domain. ln other words, both the original and obfuscated programs consist ofJava class files, but in the ob- fuscated program there are no classes ⅲ the constant P001 (every class is represented by a umque integer), virtual method dispatch is done through explicit virtual tables rather than theJava bytecode invokevirtual instruction, all memory is in the form of a flat array (which includes the method call stacks), exception handling does not use Java's built-in mechanism, and so on. ln addition to these transformations, SPMA alSO performs traditional data obfuscations. Have a 100k at Figure 4.4 2 め , where we've sketched how SPMA transforms a program. The original program contains two methods, p and Q. Their control flow graphs are merged and flattened using chenxification. Whenever possible, methods from the standard library are merged the same way.. Since theJVM limits the size of a method, the obfuscated program will have several methods contaming the basic blocks from the original program. ln Figure 4.4 第 , 2 め , the basic blocks from p, Q, main, and Hashtab1e. put are all merged into two methods, ml and m2. Java expects some methods tO never change name or signature. For example, the program has tO have a method named main (where executlon starts), and it has tO have exactly one argument, an array of strings. Therefore, in the obfuscated program, main becomes a stub that jumps to the appropriate flattened method. Similarly, to handle GUI events, you have to implement methods with a particular name and signature (such as mousePressed(MouseEvent e) ) , and these, t00 , have tO be converted tO stubs. ln other words, even though the entire program has been flattened intO just a few huge static methods, the obfuscated code will still contain remnants Of the original one, with hints to the adversary as to where particular parts of the executlon orlglnate. 4.63.1 An Example To remove the high-level structures from the Java bytecode, SPMA flattens control flow ()o make it difficult to reconstitute source-level control structures), turns typed data into (almost) untyped data, merges methods ()o remove procedural abstraction) , replaces classes with unique integers, implements method dispatch through its own virtual method tables, makes use of its own method call stack, and implements its own verslon of common classes from the Java library (to avoid leaving hints from unobfuscatable library calls). ln addition to synthesizing high-level structures from low-level primitives, SPMA does some simple transfor- mations on data, such as you saw in section 4. う 258. Since the SPMA tools were never released in the public dommn, we can't know the exact design decisions, so
8.7 lmproving Stealth 8.7 lmproving SteaIth 505 There are essentially three approaches to making introduced watermarking code more stealthy. You can: 1. Make the introduced code 100k morelike original code, or 2. Make original code 100k more like the introduced code, or ろ . Make all code, original and introduced, 10 。 k like some third type of code. ln this section, we'll show you three algorithms that attempt to improve stealth. The first one, wMMIMIT にう 1 ー 2 う引 , makes a copy of an existing function and makes small changes to it in order to embed the mark. Since the function origmates in the cover program, presumably it 100kS like the type of code the programmer typically writes. The second algorithm, WMVVS [ 79 800 う 40 う第 , creates a function that embeds the watermark from scratch. lt then adds a uniform mess of branches all over the program to make it appear as if the watermark belongs to the cover program. ln Other words, it turns the entire program into something that is definitely unstealthy, unusual, and suspicious-looking, but at least the cover program 4 〃ノ the watermark function will 100k equally funky. Algorithm WMCC [ 9 う , 96 ] tries to be both statically and dynamcally stealthy: lt only inserts integer additions and multiplications (very common operators), and the inserted code executes ⅲ tandem with the code of the cover program. UnfortunateIy, as you will see, the integer constants that it inserts and the integer values that it computes at runume are both highly unusual and thus prime targets for attack. 8.7.1 Algorithm wMMIMIT: Mapping lnstructions Algorithm VMMIMIT [ 251 ー 2 う引 is a very simple watermarking algorithm, where embedding a mark is done in three steps: 1. Add an unrelated dummy function D to the program. 2. Modify D's opcodes and literal arguments to embed the mark. ろ . Add a bogus call if (PF) D(), protected by an opaque predicate, to tie D to th e cover program. For example, in Java bytecode you can embed three bits of watermark for any binary integer arithmetic instruction that you find, replacing the instruction with an
8.7 lmproving Stealth 513 flow can't be optimized away. Method m4 uses a global variable bogus ⅲ the guards of all control flow statements. An adversary or compiler would need to do some ser10tlS lnter-procedural static analysis to determine that m4 or any of its internal flow could be safely removed. The maln contribution of WMVVS is the way the algorithm ties the watermark code to the surrounding program. The idea is to add bogus control flow between not only the real code and the watermark code, but different parts of the original program as well. AS long as you make sure that the watermark functions have no bad side effects or cannot get into an infinite mutual calls to them just about anywhere. Calls to the original functions in the program will need to be protected by false opaque predicates unless you can determine that they are side effect free. ln Listing 8.22141S an example with two of the five watermark methods, m3 and m4, one original method P, and the main method. Notice how some of the calls to the watermark methods have been used as opaque predicates. For example, since m3 (x) is designed to always return a non-negative lnteger, the expression m3 ( 9 ) > = ① is opaquely true and won't affect the while-loop in which it occurs. ProbIem 8.9 Algorithm wMMIMIT embeds the mark by modifying arithmetic instructions Of a function already in the program. This may lead to more stealthy watermark functions than building the CFG from scratch, as suggested here. ls it possible tO combine these two ideas? Can you modify a CFG or combine several CFGs already in the program to embed the mark? Problem 8.10 Exception-handling code can result in very convoluted control flow graphs. Can you think of a way to encode RPG watermark graphs using exception handlers? ls the result more or less stealthy than using normal control constructs? 8.7.2.2 Recognition To extract the watermark, you must somehow be able to de_ tect which basic blocks in the program belong to the watermark CFG and which be- long t0 the cover program. WMVVS does this by 4 z ツ g the basic blocks. One way to
532 Key Key Permutation 16 ー 15 Obfus cation %eory PIain Text lnitial Permutation R16 = わ 15 工 ( R15 ) = んル 1 ①工田ル 1 ) lnverse Permutation Figure 5.2 DES transforms 64-bit blocks using an inltial permutation, 16 "rounds" and a final permutation. ln each round, an alternate half of the input is scrambled with a portion of the key before being recombined with the remarning half. ln other words, the 58th bit of the key becomes the 1 st bit of the permuted key, the 50th bit becomes the 2nd bit, the 42nd bit becomes the third bit and so on. Notice that there are only う 6 entries in the table. As a result of the permutation, 8 bits of the original key are dropped by the DES algorithm, giving an effective 56-bit key. Thus, from the original key K, you get a 56-bit permuted K': K' = 010100110110100001110111011001010111010001101000011000010101110 As shown in Figure う .2 , to encrypt plaintext, it is divided into blocks of 64 bits each. For each block of bits of plaintext, DES permutes the block according to a special table. lt then divides the result into a le 丘 (Lo) and right は 0 ) half, each う 2 bits long. The new left half ( 新 ) is simply the old right half ( R() ). A keyed substitution on the right half ( & ) is performed by rotating it 1 飜 by one or two bits (depending on the round). This result is then permuted. An exclusive-or is then performed between this
660 Hardware for Protecting S0ftware 11.1.1.2 CD-ROM protection Software distributed on CDs uses a different stan- dard , IS09660 [ 17 う ]. Many p rotection techniques that have been invented are still based on the same idea: Abuse the standard so that disk-copymg software or CD burners will have problems with it 6 , Chapter 4 ]. For exampl% IS09660 restricts the name of the disk t0 be made up of the characters [A-Z@-9-] , and a simple trick to trip up CD burners is to insert a blank in the name. SimilarlY' the TOCs could report the wrong sizes Of files, or the tOtal amount Of data stored could appear to be larger than the physical media st1PP0rtS ・ You can alSO insert frame errors on the disk ⅲ areas that don't contain real data. These areas won't be read when a program is executed 0 任 the disk' but disk-to-disk copiers will fail when trymg t0 read them. For exampl% you can change the error correctlng code in a frame SO that hardware won't be able tO correct for the error and give up on copying the disk. ln contrast tO audiO CDS, software distributions consist Of executable content. This allows the program t0 check that the original CD is ⅲ the CD drive' refusing to run if it's not. This presupposes that it's possible t0 produce original CDs with ん″ん 4 ゞ , traits that are unique t0 original CDs but that will not be reproduced by copy- ing software and CD burners. The CD-COPS 6004 ] (http ://www.linkdata.com/ system measures and verifies the angle between tWO sectors Of the something that will not be mamtained in a burned COPY. The algorithm goes something like this: 1. Add the following code t0 your program: int expected = ro 戸 / the 4 e をん the ve ″ア 04 / あ〃 0 ノ e ル″ 7 〃 0 〃 the CD; に〃万坊 4 / 4 CDis / 〃坊ノ″ツ礪 = 尾 4 ノ坊に五ハ / 0 尸 0 〃 the CD; Sector first = 尾 4 ノ / な立 0 〃 / 厖 CD; Sector last int angle = じ 0 戸 4 the 4 〃 g なみル e 〃 first 4 〃ノ last ; if (angle ! = expected) abort() ; 2. produce the gold master for your CD and send it to the plant to be manufactured. う . Receive a shipment of CDs. They've all been produced from the same glass master and hence they're all identical. 4. Measure the angle between the first and the last sectors on one 0f the CDs and print it as the verification code on the cover Of all CDs.
245 43 Complicating Control 日 ow An interesting development are interactive de-obfuscation envlronments such as LOCO にう 6 , 2 う刀 (for X86 binaries) and SandMark [ 81 ] (for Java bytecode). These envlronments provide user interfaces that a Ⅱ 0 从 , a reverse tO representauons (such as control flow, call graphS' and inheritance graphs) 0f a program, and, in the case 0f LOCO, manually modify these representations. 45.6.1 AIgorithm REUDM. ・ Dynamic Attacks Against Control-Flow Flattening Algorithm REUDM けう 2 ] uses a hybrid static-dynamic approach t0 recover the original control flow from a program flattened bY Algorithm OBFWHKD. Have a 100k at this control flOW graph Of the modular exponentiatlon routine: ( 1 ) next=@ false true = ① next true false = 1 next つっ ) 4 next=2 false true k<W ( 6 ) next=6 : next=2 lt's essentially the same graph as the one you saw ⅲ Figure 4.2 228 , except that ℃ ve modeled the switch statement as a series Of branches. ・ we've also numbered each statement. ln order tO turn this graph intO the original one in Figure 4.1 》 227 , you need to figure out what are 秀ルな and ツ秀ルな paths. ln Section 43. う 229 , you saw hO 嶬 , one trick tO confuse a de-obfuscator was tO add a block tO the switch statement that would never be executed. TO properly de-obfuscate the routine' you need tO dO (the eqtllvalent (f) a constant-propagation data flOW analysis tO determine that the next variable can never take on a value that would make this block execute.
う .1 Definitions 5 Let's start by reiteratlng your lntent. Your intent is tO protect S01 れ e property 0f a program by transforming the program 1nt0 a new one in which that property is harder to deduce. Unfortunately, while you may intuitively understand what this means, When such an intent is expressed in prose it is difficult tO reason about. For example, you need t0 state formally what you mean bY Pr0PertY 0f a program" and by "harder to deduce. " ln formalizing your lntent, however, you must make many simplifying assumptions. As you read the remainder Of this chapten we encourage you tO question the simplifying assumptions that are necessary and tO consider how well they capture your lntent—a proof that some obfuscation technique is secure will not be very useful unless the definition 0f obfuscation agrees with your original lntent. UnIike the previous chapter, here you also have the challenge 0f defining a very general formal model so that the conclusions you reach remam applicable no matter what new techniques for program analysis are discovered. Our first reqtllrement is one ofcorrectness. An obfuscated program must behave the same as the original unobfuscated one. This is the same reqtllrement we had on obfuscated programs in Chapter 4. Definition 5.1 (Correct). Let an input ー from a set of inputs 工 be the input t0 a program p. An obfuscating transformation T 0f a program P is correct if and only if: vl eT: T(P)(I)= P(I) ln other words, an obfuscating transformation is correct if an obfuscated pro- gram still computes the same function as the original program•What about error states? Unlike the prevrous chapter, from here on we will assume that an error lS simply another possible output 0f p , and we reqmre that an obfuscated program mamtain this error. The second reqtllrement iS one Of useftllness. An obfuscation must hide S01 e - thing. Let's define assets as the feature Of a program that you want tO hide: Definition 5.2 (Asset). An 4 asset( ・ ) is a derivable property of a program P and its set of inputs ム such that asset(P,T) = 1. What kinds of assets are you usually interested ⅲ hiding? You may be interested ⅲ protecting data such as a secret key embedded in your program' or data encoded by your program, or you may want t0 hide some functionality such as a regular expression, an algorithm, Other combination Of properties.
8.4 Watermarking by Permutat10n 487 ln many languages, for example, the order of declarations is に巳 This means that functions or variable declarations can be reordered to embed a mark. ln the exam- ple in Listing 1.6 》・ろ 9 , we showed that the case-labels in a switch-statement may be reordered tO embed a mark. You can even reorder assignment statements within a function, as long as there are no dependencies ℃ en them. For example, because of the data dependencies in the example below (see section タ 13 第 132), statements A and B can be reordered but B and ( cannot, and neither can c and D: 十 「」 8 9 一 Recognition isn t as straightforward. To extract the permutation from the wa- termarked program, you need to know what each item's order was in the 0 ″ gz ツ 4 / program. Say, for example, that you used the permutation int2perm(4) = く 1 , (), 2 〉 t0 reorder the three original statemenis to the left, yielding the marked program on the right : 〔」 8 9 一 0- ・ 1 っ一 8 「 ) 9 一 1 画 -0- っ 4 Unfortunately, during recognition there's no way of knowing that y=8 was originally the second statement, and so there's no way of recovering the permutation ! There are two ways of solving this problem. The first is to make the recognizer 〃 0 〃訪 / ツノ 1. e. , tO give it access tO bOth the original and the watermarked program. The second is tO put the program ⅲ a 〃 0 〃な 4 /. ん prior to marking. Here, we first sort the statements lexicographically and then apply the permutation: 「つ 8 9 一 9 一 8 「 ) -0 一 11 つん ・ 1 0- っ 4 TO extract the permutation, you compare the location of each statement with its location in the lexicographically sorted order.