you - みる会図書館

REGISTER Addison WesIey THIS PRODUCTU informit.com/register Register the Addison-Wesley, Exam Cram, Prentice Hall, Que, and Sams products you own tO unlock great benefits. TO begin the registration process, simply go tO informit.com/register tO Sign in 0 「 create an account. You will then be prompted tO enter the IO- or 13-digit ISBN that appears on the back cover Of your product. Registering your products can unlock the following benefits: ・ Access tO supplemental content, including bonus chapters, source COde, 0 「 project files. ・ A coupon tO be used on your next purchase. Registration benefits vary by product. Benefits will be listed on your Account page under Registered Products. About lnformlT ー THE TRUSTED TECHNOLOGY LEARNING SOURCE 4F0 日 M げ旧 HOME TO THE LEADING TECHNOLOGY PUBLISHING IMPRINTS Addison-WesIey professional, CiSCO press, Exam 旧 M press' Prentice 日訓 ProfessionaI, Quej and sams. Here you will gain access tO quality and trusted content and resources from the authors, creators, innovators, and leaders Of technology. Whether you're IOOking for a bOOk on a new technology, a helpful article' timely newsletters' 0 「 access to the safari BOOks Online digitallibrary,lnformlT has a solution fO 「 you. THE TRUSTED TECHNOLOGY LEARNING SOU 日 CE inform T.C ① m Addison-WesIey ー CiSCO Press ー Exam Cram 旧 M Press ー Que ー Prentice Hall Sams SAFARI BOOKS ONLINE

2. Surreptitious Software

586 Dynamic Waterm arking time it was executed and compute a bit from subsequent executions' a 0 bit if it goes the same way as the first time, a 1 bit otherwise. For example' consider the trace ⅲ Table 9.2 第め . when you take the branch from B 1 (trace point # 0 ) the first time, you don't generate a bit but simply record that you Jumped to block B2. For the subsequent six umes that you d0 the test at BI (trace points # 4 , 8 , 12 17 21 26 ) , you alSO jump tO B2, which generates six 0 bits. The seventh time' at trace point う 0 , you lnstead jump tO B6, which therefore generates a 1 bit. The full trace generates the bit sequence ①①①①① 1 ①①① 1 ①① 1. To embed a watermark simply reqmres you t0 add new branches t0 the program that don't change its behavior but add the watermark bits tO the trace when the program is run with the special input. WMCCDKHLSpaths has two ways of encoding the watermark bits, as a sequence Of if-statements (one per bit Of watermark) and as a 100P. Let's say you want t0 embed the watermark 100102 at location B4 in the example program. At B4, there are the three available interesung user variables b' and i that you can potentially reuse t0 encode your bits. You 100k at the trace ⅲ Table 9.2-58 ぅ and notice that the first time executlon passes B4 (trace point # 1 う ) , a=b=i=4' and the second time (trace point # 24 ) , a=b=4 , i = 6. perfect ! Now you have t0 d0 t0 embed 100102 is to insert these branches: int u = ① ; if (A2: a==b) A3 : u + + ; if (A6:i==b) A7 : u + + ; if (PF) live—var + = u; How does this work? The first time you reach i==b and you branch t0 AI. You don't generate a watermark bit at this point but instead record the fact that was followed by AI. You record the same information for the remainder 0f the if-statements: A2 is followed by A3 , A4 by and SO on. The next time you execute these statements, you write down a 0 bit if the branch goes tO the same location as the first time, and write down a 1 bit otherwise. At the branch fails (since i=6 , b=4) and you branch tO A2. Since the first time you branched tO AI' you write down a 1 bit. At A2, you generate a () since a=b and you branch tO A3' just as the first time.

3. Surreptitious Software

11.2 Authenticated Boot Using a Trusted Platform ModuIe 671 (such as IBM laptops) already have a TPM soldered onto the motherboard, and there are research systems built on top ofthem to provide trust. However, at present, we're aware of no actual deployed systems. The AEGIS に 1 22 ] system from a group at the University of pennsylvania seems to be the first published account of a trusted bOOt architecture. ln this section, we re going to roughly follow the implementation put forth by a group at IBM 卩 12 , 引引 . SO how does a system convlnce you that it's benign? 、Ⅱ , it uses a combination Of ゞ 0 夜 2 / 〃 4 立 ("You should trust p because everyone else does"), ゐ ar ノル 4 尾〃 4 立 ("You should trust that the secrets stored in p are safe because we ve wrapped them ⅲ plastic") , and / 〃立 7 加 e 〃郷 / ("lf you trust p and p trusts R, you really should trust ("). Let's start by looking at hardware and transitive trust, and leave social trust for later. 11.2.1 Trusted Boot Let'S consider a scenariO where you want your program tO computer. Your program contalns a secret, however, so you encrypt it, and you tell Bob to boot a special operating system eO 、ゞ that handles encrypted executables. But you don't want to hand over the program to eOS until Bob has convinced you that nothing could possibly go wrong—that there's no way he'll be able to extract the secret. so what び 04 な go wrong? well, first, Bob could tell you he booted into the version of eOS you gave him, when in fact he first hacked it to leak the encryption keys. so you ask him t0 compute a cryptographic hash over the kernel and send it to you. If the hash matches that of the eOS you trust, you should be fine, assuming for a moment that Bob can't lie about the hash. Of course, the kernel was started by a bootloader, and Bob could have hacked ″ to modify the eOS image before loading it. So you ask Bob to send you a hash of the bootloader. Of course, the bootloader was started by the BIOS, and ″ could have been hacked ! So Bob sends you a hash 0f ″ , but .... You see where this is golng. ス〃ッ / んツ g that's running on Bob's computer could affect whether you should trust it or not. We mentioned the OS, the BIOS, and the bootloader, but other application programs, firmware [ 1 う 9 ] in connected devices, kernel modules, OS configuration files, and scripts could also do damage. The idea of a trusted boot is to に 4 4 尾 ve ヴ potentially hostile piece of code runmng on a computer and compare lt against a library Of 走〃 0 ル〃 g00 ノ 4 尾 e 〃な . ・ Wl ・ len we say measure" we really mean "compute a cryptographic hash of" and a "known good measurement is a hash of a program you trust. If Bob can convince you that his computer, from the ground up, only runs code you trust, then you should have no problem handing him ()r selling him) a program to run.

4. Surreptitious Software

348 Obfus cation Theory The algorithm starts by making all the participants stand around in a circle. The first person, Alice, begins by thinking of a random number N, adds her allowance x, and whispers the sum to Bob. Bob takes that number, adds his own allowance tO it and whispers it tO the next person in the circle, carol. She, ⅲ turn, adds her own allowance, and SO on. ・ When the total finally returns to Alice, she substracts the random number N that she began with and announces the total to the group. Every member of the group can then divide the total by the number of people in the group and know the average allowance. lt is instructive tO briefly examine how this protocol works. No one member at any time has sufficient informatlon to decode the "runnlng total. " What at- tacks are possible ⅲ this scenario? What would happen if two people ⅲ the group were to collude? For example, if Bob and David colluded by sharing their totals, they could compute Carol's allowance. If the two colluding parties are separated by a larger number of people, they may not be able to tell what each person be- tween them had as an allowance, but they would be able to have a much more accurate approximation Of their tOtal income than can be determined from the average. A second attack that is possible is that any participant could lie about their lncome. While it is always the case ⅲ multiple-party computations that a paruci- pant can lie about his input, in this problem it is particularly advantageous for a participant to lie. If he is the only party to lie, at the end of the round he would be able to calculate the true total while every other person ⅲ the circle would have an lncorrect value. Can we do better than this? Can these problems be resolved? Can the protocol be extended tO calculate functions other than the average? 5.5.2.2 Millionaire Problem Go 引 Asset: Private data Attacker Limits: None lnput Programs: AII possible programs Suppose you and a friend wish to calculate which of the two of you is the richer but you dO not want to reveal this informatlon to each other. One way of managmg this would be for each person to reveal their wealth to a trusted third party, who in turn wou]d tell you who is richer. ln such a scheme, no one friend would have

5. Surreptitious Software

7.4 State lnspecuon 445 SO what's the alternative? If you can't check that your program is intact by verifying the correctness Of the code, what can you do? ・に 11 , you can try to verify the cor- rectness of the or the び 0 〃な 0 / ア 0 ル ! The idea is that you might be able to detect tampering 0f the code by the にアな the code produces, 1. e. , in the way that the code makes data change and in the way it forces control to take different execution paths. ln addition to being more stealthy than introspection, these techniques have the advantage that you can use them not only on binary code but also on type-safe code like Java bytecode. You might want to think of this as an advanced form of assertion checking—you'll be adding checks to the program to ensure that at no point does it get intO an unacceptable state. You're not checking, "ls this code correct? " but rather, "ls the program ln a reasonable state? " The state is a combination of all your static data, the stack, the heap, and the program counter, and you can access all of these whether you're running a binary executable or Java bytecode. (The pc you can t access directly from Java, of course, but you know 〃防尾ⅲ the code you are when you're doing the assertion check, which amounts to the same thing. ) Unfortunately, automatically adding assertion checks to a program isn t easy. The current state of a program depends on all previous states, so how could you possibly analyze your user s program to come up with non-trivial invarlants to add as checks? An alternative to adding assertions on the entire state of the running program is t0 call, say, a function at a time, feeding it challenge data and checking that the result is as expected: int challenge int expected 12 ① ; int result factorial(challenge) ; if (result ! = expected) abort() ; "Hash values" (the expected results of the functions, gwen the challenge inputs) are easy tO compute at protection t1me—Just generate random challenge inputs, call the functions, and record the results ! Just like you saw ⅲ Algorithm rrpHMST however, you have to be careful not to generate suspicious-looking hash values or challenge data that the attacker could easily spot as unusual: if (factoria1(17) ! = 355687428 ① 96 ①①の abort ( ) ; Neither 17 nor ろ 5 う 687428096000 is likely to occur frequently ⅲ real programs.

6. Surreptitious Software

4.5 Data Encodings 259 new into the 01d. The analogy with the び々〃 and 々〃 functions Ekey() Dkey() is intentional—in fact, you could imagine using encryption tO obfuscate data. You will see examples Of this in the next section. TO make this a bit more formal, imagine that you have an abstract data type 7 ' with operations OT, ØT, type OT : T x T → 7 ' 〇 7 , : T x T → T To obfuscate T, you construct a new abstract data type with operations for converting between T and T', and one new operation operating on for every Dr, : T' ET' : T type operation operating on T: & 卩 : T' x T ′→ T' : 7- ' ′ x T ′→ T ′ T ′ even if their underlying representation is based on the same obfuscation algorithm. so that you can create a large number 0f different-looking obfuscated variables tation tO be parameterized. ln Other words, you want a - な Of representations ldeally, to prevent pattern-matching attacks, you want the obfuscated represen- operation on the de-obfuscated values. directly on the new representation, but for Others you will have tO perform the that, for a particular obfuscated representatlon, some operations can be performed directly. ln practice, however, you will often have t0 compromise. You may find much prefer for every operation tO be performed on the obfuscated representation cated type it reveals the de-obfuscated values 0f the arguments and the result! You This really isn t a good idea, since every time you perform an operauon on an Obfus- x ØT' ア x ①ア convert back tO T, perform the operation, and then again convert tO The simplest way tO obfuscate is for every operation on values Of type 7 ' ' tO first

7. Surreptitious Software

Preface XXIII read Chapter 1 and Chapter 2. If you're a researcher with a background in compiler design, you can skip Chapter う . lt's advantageous t0 read the algorithm chapters in order, since, for example, the watermarking chapter relies on ideas you've learned in the obfuscation chapter. still, we've tried tO make each chapter as self-contained as possible, so skipping around should be possible. If you're an engineer charged with adding protection to your company s product line, you should read Chapter う carefully and maybe complement that by reading up on static analysis ⅲ a good compiler text. You can then move on tO the algorithm chapter relevant tO you. If you re a graduate student reading this bOOk for a class, read the entire thing from cover tO cover, and don't forget tO review for the final ! We hope this book will do two things. First, we want t0 convince you, dear reader, that code obfuscation, software watermarking, birthmarking, and tamper- proofing are interesting ideas well worth studying, and that they are viable alter- natives tO protecting the intellectual property you have invested in your software. Second, we want this b00k t0 bring together all available information on the subject SO that it can serve as a startlng point for further research in the field. Christian and Jasvir Tucson and Mountain View Groundhog Day, 2009 P.S. There is a third reason for writing this book. lf, while reading this book, you are struck by the cleverness Of an idea and, as a result, you become inspired tO make your own contributions t0 the field, well, then, dear reader, our goal with this b00k has really been met. And when you ve come up with your new clever algorithm please remember tO let us know SO we can include it in the next edition !

8. Surreptitious Software

1.9 1)iscusslon 57 And that's not to mention bugs. Many 0f the techniques you will see ⅲ this b00k are based on code transformations. As such, they're very similar tO the op- tim1Zing code transformations you'll find in sophisticated compilers. And as such, they're susceptible t0 the usual menagerie 0f bugs. As a result, your program that was perfectly functional before protection was added may exhibit erratic behavior afterward, either as a result Of a bug ⅲ the protection t001 itself or because its COde transformations revealed a latent bug ⅲ your own code. 1.93 So Which Algorithms ShouId I Use? SO you ve written your fantast1C new ・ program and for whatever reasons you ve decided that it makes sense tO add some sort Of software protection tO it. You've bought this b00 い and read it cover t0 cover search 0f the One True Algorithm t0 implement. Unfortunately, if this was your attitude, you will have been sorely disappointed. The problems facing software protectlon research are multitudinous, but it bOils down tO one central issue: を IO 嶬ー dO you evaluate the effectiveness Of an algorithm? Unfortunately, the state of the art is sorely lacking. ldeally, this book would end with a single large table over all the algorithms that lists for each one ガ 0 ″ / 0 切戸な e 〃 / , e_ 〃 0 ″知 4 / , な / な / ツん / な 4 〃 , and 4 〃化 0 ノ .6 As you may have already guessed, there's no such table. Without knowing how many more hours/days/weeks/months your program will remain uncracked if you protect it with the algorithms in this bOOk, hOW can you know if software protection will make sense ln your particular case? And without being able tO compare tWO algorithms tO say that one is better than another, hOW can you pick the right algorithm t0 use? And how does the field progress when a paper purporting tO present a new, improved algorithm can't substantiate its claims? Disturbing questions, indeed. ln practice, software protection takes a belt- and-suspenders approach tO security. You layer protection algorithms until you ve convinced yourself that your program IS secure "enough" while at the same tlme remaimng within performance bounds. If you can afford t0' maybe you employ a professional red-team t0 try t0 break through the defenses and give you some feel for how long the protection will survive ⅲ the wild. But in the end' you have t0 accept the fact that you re engaged ⅲ a cat-and-mouse game: If crackers deem your program sufficiently crack-worthy, eventually they'll succeed. This situatlon may 5.nmnks! 6. Effort is measured in person wall-clock hours. A parallelizable attack can be sped up bY simply throwing more crackers at the problem.

9. Surreptitious Software

93 Algorithm wMCCDKTILSpaths: Expanding Execution paths 587 TO make sure that a simple static analysis won't decide that these statements are dead code, it's a good idea to bind the generated code to the rest of the program. ln this example, we simply added the statement if (P ) live_var + = u where an opaque statement makes it appear as if the variable u is actually being used. Since the trace gives you exact knowledge ofvariable values at everypoint during the recognition run, lt's easy to generate arbitrarily complex conditions to confuse a pattern-matching attack. If you need to generate a true statement and the trace tells you that a=4, then you could generate a>@, a & 1 = = ① , a ! = 3 , and so on. And once you have several true statements available, you can J0in them together with a conJunction ( ) tO make other true statements. Note that this is different (and easier) than generating opaque predicates that have tO be true for a range of input conditions. ProbIem 9.10 Depending on the behavior of the original trace, it might be possible to hijack bit strings generated by the unwatermarked program. lf, for example, you want tO insert ① 11 ① 11 ① and the original trace already has ① 11 ① 11 , you just have to add one extra branch. How feasible is this? Assuming that your watermark pieces are 64-bit blocks with random distribution of bits, do a preliminary study of the behavior of some real programs to see if they exhibit the required behavior. You can also encode the watermark as a 100P. Let's say you want to insert the five watermark bits 110102. During the 100P , it will be most convenient to shift out the bits from the right, so first reorder the bits so that the least significant bit becomes the most significant, i. e. , 010112. Then add a priming 0 bit to the least significant position, which gets you 0101102 = 1616. You insert this value into a 100P like this: int u = ① ; long bits ① X16 ; for(int k=@ ; k く 6 ; k + + ,bits>>=l) if ((bitsA&I)==1) AI : u + + ; else A2: if (PF) live-var + =u; The 100P will consider the bits in the order く (), 1 , 1 , 0 , 1 , () 〉 . The first time around the 100P , the priming 0 bit is tested; the test fails; and you branch to block A2. The next iteration, you shift out a 1 from the bits variable, and the branch goes to block AI. Since this is different from the priming block, you write down a 1 bit. The next iteration also goes to AI, and so you write down another 1 bit. During the next iteration, however, the test fails, and you branch to A2, the same block as the priming block, and so you write down a 0 bit.

10. Surreptitious Software

53 Reconstituting Source 171 Your program is called p. ( , and you use the cc compiler tO turn it intO assembly code, p. s. You then assemble it, using as, gettlng an Object file P. 0. ln addition tO the code and data in the . text and . data segments, the Object file contains a symbol table and relocation information. You then link p. 0 with ld, merging in necessary system libraries. FinallY, you strip the code 0f its symb01 table and the relocation information, and you ship the resulting program P ' tO your users (and attackers). somewhere during this process, you transform the program tO make it less susceptible tO attack. But where? You could dO a source-to-source transformation on the C source code, you could edit the assembly code, or you could transform the binary code in p ・ 0 , p, or p ' AII these can be valid options, depending on what information the protection algorithm needs, what your development process 100kS like, and what tools you have available. If your algorithm needs much semantic information, then a source-to-source process might work best. You start by building a compiler ()r better yet' retrofitting an exlstlng one), perform your protection transformatlons on the abstract syntax tree ()r any Other convenient intermediate code) , and tell the compiler tO generate source as its output. If your transformation targets low-level code rather than high-level programming language constructs, you could dO source-to-source on the assembly code. If your transformatlon needs detailed type information, however, you need tO recreate it, S1nce much Of it 嶬は S lOSt in the translation process. Binary-to-binary transformations, finally, reqtllre you tO crack open the exe- cutable file, extract the functions, turn each function intO something manageable ()t least, a list Of assembly code instructlons, but more commonly, control flOW graphs), perform your protective transformation, generate new binary and produce a new executable e. This is the scheme you saw illustrated ⅲ the fig- ure above. The advantage Of this scheme is that it interferes very little with the rest Of the development process. Once the development team is satisfied that they have a releasable product, you run the executable through the protection t001 , run the regression tests again (just t0 be sure that the t001 didn't add any bugs)' and get a new executable that's ready t0 be shipped. The disadvantage 0f this scheme is that, in addition tO much semantic information from the original pro- gram's being lost and the fact that working on binary code is a the process 0f breaking open the executable and turning the raw bytes 1ntO something useful is non-trival. The good news is that it's non-trivial for the attacker also. The attacker, unlike you, doesn't have many choices as t0 which level 0f code to work on. Whatever you send him, that's what he's got. Typically, you'll send him stripped binary code. SO his process 100kS something like this: