検索 - みる会図書館

検索対象: Surreptitious software obfuscation watermarking and tamperproofing for software protection

Surreptitious software obfuscation watermarking and tamperproofing for software protectionから 453件ヒットしました。

Surreptitious software obfuscation watermarking and tamperproofing for software protection


Contents 9.43 Tamperproofing the Branches 9.4.4 Discussion う 97 9. う Discussion う 98 10 Software Similarity Analysis 601 IO. 1 Applications 602 10.1.1 Clone Detection 60 10.1.2 Software Forensics 60 う 10.13 Plagiarism Detection 608 10.1.4 Birthmark Detection 610 10.2 Definitions 611 10.2.1 Similarity Measures 612 IO. う k-gram-Based Analysis 616 XIII う 96 10. う .1 ′ ス 旧 NN( ) Ⅳ : Selecting k-gram Hashes 616 103.2 、 躄 、 ス M ( ) : Software PIagiarism Detection 619 10. う . う 、 MC 。 … ・ k-gram Java Bytecode Birthmarks 62 う 10.4 API-Based AnaIysis 62 う 10.4.1 、 ssTNMM. ・ Object-Oriented Birthmarks 626 10.4.2 ssTONMM. ・ Dynamic Function Call Birthmarks 629 10.43 、 ゞ & ゞ D む Dynamic k-gram API Birthmarks 6 0 10. う Tree-Based AnaIysis 31 10.5.1 、 ssEFM: AST-Based Clone Detection 6 引 IO. 6 Graph-Based Analysis 6 10.6.1 K 日 : PDG-Based CIone Detection 36 IO. 6.2 、 ssLCHY: PDG-Based Plagiarism Detection 640 10.63 、 躄 MC 。 が Dynamic 4101C Program Birthmarks 1 10.7 Metrics-Based Analysis 644 10.7.1 、 躄 KK. ・ Metrics-Based CIone Detection う 10.7.2 、 ssLM: Metrics-Based Authorship Analysis 646 10.8 Discussion 652 11 Hardware for Protecting Software 655 11.1 Anti-Piracy by Physical Distribution 6 う 7 11.1.1 Distribution Disk Protection 6 う 8 11.1.2 DongIes and Tokens 6 11.2 Authenticated Boot Using a Trusted Platform Module 11.2.1 Trusted Boot 671 11.2.2 Taking Measurements 67 ろ 11.2. う The TPM 676 11.2.4 The ChaIIenge 677 670

Surreptitious software obfuscation watermarking and tamperproofing for software protection


Contents 9.43 Tamperproofing the Branches 9.4.4 Discussion う 97 9. う Discussion う 98 10 Software Similarity Analysis 601 IO. 1 Applications 602 10.1.1 Clone Detection 60 10.1.2 Software Forensics 60 う 10.13 Plagiarism Detection 608 10.1.4 Birthmark Detection 610 10.2 Definitions 611 10.2.1 Similarity Measures 612 IO. う k-gram-Based Analysis 616 XIII う 96 10. う .1 ′ ス 旧 NN( ) Ⅳ : Selecting k-gram Hashes 616 103.2 、 躄 、 ス M ( ) : Software PIagiarism Detection 619 10. う . う 、 MC 。 … ・ k-gram Java Bytecode Birthmarks 62 う 10.4 API-Based AnaIysis 62 う 10.4.1 、 ssTNMM. ・ Object-Oriented Birthmarks 626 10.4.2 ssTONMM. ・ Dynamic Function Call Birthmarks 629 10.43 、 ゞ & ゞ D む Dynamic k-gram API Birthmarks 6 0 10. う Tree-Based AnaIysis 31 10.5.1 、 ssEFM: AST-Based Clone Detection 6 引 IO. 6 Graph-Based Analysis 6 10.6.1 K 日 : PDG-Based CIone Detection 36 IO. 6.2 、 ssLCHY: PDG-Based Plagiarism Detection 640 10.63 、 躄 MC 。 が Dynamic 4101C Program Birthmarks 1 10.7 Metrics-Based Analysis 644 10.7.1 、 躄 KK. ・ Metrics-Based CIone Detection う 10.7.2 、 ssLM: Metrics-Based Authorship Analysis 646 10.8 Discussion 652 11 Hardware for Protecting Software 655 11.1 Anti-Piracy by Physical Distribution 6 う 7 11.1.1 Distribution Disk Protection 6 う 8 11.1.2 DongIes and Tokens 6 11.2 Authenticated Boot Using a Trusted Platform Module 11.2.1 Trusted Boot 671 11.2.2 Taking Measurements 67 ろ 11.2. う The TPM 676 11.2.4 The ChaIIenge 677 670

Surreptitious software obfuscation watermarking and tamperproofing for software protection


0 Software Similarity Analysis 10.6.2 & 化 C. : PDG-Based Plagiarism Detection Like Algorithm ssKH, Algorithm ssLCHY に う 幻 uses but does so for pla- giarism detection rather than clone detection. The major differences between the two algorithms are that SSLCHY employs a general-purpose subgraph isomorphlsm algorithm rather than slicing, and in order t0 speed up processing' uses a prepro- ces sing step t0 weed out unlikely plagiarism c andidates. Algorithm 10.8 物 題 0 gives an overview ofthe method. The first problem you need tO solve is what it should mean for one PDG tO be considered a plagiarized verslon Of another. Since you expect some manner Of obfuscation Of the code on part of the plagiarist, you can't reqmre the two PDGs t0 be completely identical• You instead need to relax the requlrement t0 say that the two PDGs should be / -isomorphic in accordance with Definition 10. う 参 、 61 ). Liu et al. set / = 0.9 , argue that "overhauling Algorithm 10.8 Overview of Algorithm ssLCHY. p is the original program and Q the plagiarism suspect. K is the mimmum number 0f nodes a PDG should have to be considered. / is a relaxation parameter for the subgraph isomorphism test. DETECT(), Q, K, / ) : 1. For each function 2 , p ( の Q) construct its program dependence graph 2. Let 尺 be the set of all pairs of gr 叩 hs (Gi, 坊 ). う . Filter out unlikely plagiarism candidates from R: (a) Remove from 尺 any pair (Gi, 坊 ) such that Gi or 坊 have fewer than K nodes. (b) Remove from R any pair (G , 坊 ) such that は < ⅵ Gil• ) be the frequencies of the た different node kinds (c) Letf(g) = ( 41 , in PDG g. Remove from R any pair (Gi, 坊 ) where f(G) ( 坊 ). 4. DO a pairwise comparison Of the remarning palrs in R: for each pair 0f graphs ()t ー ち ) R d0 if ー 7 ・ is ot )/—isomorphic tO ( ら then う . Return R, the set of plagiarism candidates.

Surreptitious software obfuscation watermarking and tamperproofing for software protection


610 Software Similarity Analysis Algorithm 10.4 Overview 0f birthmarking algorithm. DETECT( P, Q, 尾 紡 。 な ) : み p く 一 signal extracted from P み Q ← ー signal extracted from Q if similarity(bmp, み Q ) > 尾 紡 0 な then return COPY else return ' れ Ot copy outsourcing his assignments tO an unscrupulous third party. Such programming mills" can be found on the lnternet or simply bY posting a note for "help with CSIO 1 ' on a university bulletin board. Elenbogen and Seliya instead suggest t0 use a form Of style analysis like you saw ⅲ the sectlon on software forensics. The idea is tO track the style Of the student as it changes throughout the course Of the semester. presumably, the style will improve as he gets feedback from the instructor' but this can be controlled for by comparing his changing style with the similarly changing style Of the rest Of the students in the class. Assumng everyone gets the same feed- back, their style should change at the same pace. There are a lot 0f assumptions here, and it's unclear whether it's possible t0 collect data that's believable enough that a student will fess up when confronted with let alone convince a plagiarism review bO ard that the student needs tO be rep rim anded. 10.1.4 Birthmark Detection Our primary interest in this chapter is in ん ″ る 4 は あ 〃 . Although birthmark- lng is also concerned with detectlng similarities between programs' it differs in S01 れ e respects from SOftware and plagiarism detectlon. First, birthmarks are primarily extracted from executable code' such as X86 bina- rles orJava bytecode, rather than from source code. Clone detection and plagiarism detection by definition work on source code' and most algorithms proposed for authorship analysis d0 t00 , although there's a real need for analysis 0f malware bi- naries. second, birthmark detectlon assumes a much more actlve and competent adversary than what you ve previously seen. Unlike clone detection and plagiarism detection, there's no need tO keep the code pretty enough tO remain fit for further human consumption. Therefore, the adversary is free tO mangle the code in any way he can think of ⅲ order to thwart the birthmark detector. ln the birthmarking scenario, the assumptron is that the adversary goes through the following steps to lift code from some program P into his own program Q:

Surreptitious software obfuscation watermarking and tamperproofing for software protection


10.1 Applications 605 algorithm sketches. ln subsequent sections, you'll see more detailed algorithms based on different program representauons. 10.1.1 Clone Detection After a program has gone through several development cycles, it tends to cont,ain many instances ofduplicated code. Often the duplicates are the result ofa び 0 々 ア 予 4 立 ← 0 ノ 4 , - style of programming: A programmer finds an existing code segment that's almost, but not quite, what he's looking for, simply makes a copy of it, specializes the code as necessary, and adds the copy to the program. A better strategy would be tO abstract the code segment into its own function and replace both the original and the copy with 叩 propriately parameterized calls, but this requires both time and a deeper understanding of the original code. Copy-paste-modify is both simpler and faster, at least in the short run. ln the long run, however, it becomes a maintenance headache, since a bug ⅲ the original code will need to be fixed in all the clones. Clone detection is the process of locating similar pieces of code in a progr,am. The detection phase is followed by an abstraction phase, where clones are extracted out intO functions and replaced with calls to these functions. Here, the clone detector found that function f3 is a clone of fl: fl() f3 ( ) く fl 田 〉 f2() ln the abstraction phase, the clone detection t001 created a new version of p with fl and f3 removed and replaced by f(r). f(r) is a version of fl where the parameter r represents the differences to f3. For such replacement to make sense, fl and f3 have tO be similar enough that creating f(r) is easy and that this parameterized verSIOn appears t0 programmers 嶬 410 have tO continue the code. Algorithm 10.1 6 国 shows a high-level sketch of a clone detector. CIone detectors are a software mamtenance t001 , and as such improv_ ing the program s source code. some work on the source directly, but most first transform it into a higher-level representation ( 尾 戸 in Algorithm 10. し 、 604 ) , such as a tOken sequence, an abstract syntax tree, or a program dependence graph. Unlike the scenarios you Ⅱ see next, in the clone detection scen,ar10 you don't expect programmers to be malicious. They don't deliberately try to hide that they've copied a piece of code; it just becomes naturally "obfuscated " as the result of the

Surreptitious software obfuscation watermarking and tamperproofing for software protection


1.4 Code Obfuscation Listing 13 Obfuscated voting code. ( CO 〃 4 ) 10gV0te (today , vote) ; } catch (Exception e) { } tota1Votes 十 十 ; INVALID—VOTE) inva1idV0tes + + ; if (vote 29 else print Summary ( ) ; tally[vote] + + ; public static void main (String args[]) { Voting voting = new voting ( ) ; voting. go() ; 物 怛 s SV ・ Ⅳ 100 0 」 OZ q お 田 s 川 甲 s い 」 山 田 u ー ロ no 」 言 Ⅲ P000P 田 u.lt-u 。 ・ 田 ・ 6 ゞ 6 ・ 田 8 u00 , れ 10q uo ロ d00X0 u ロ OJ ヨ 0 日 田 nu s , O•I 甲 ・ P01un00 atJIO(l 田 OA JO 100 お 00 田 0H1 detection technique is rnvented, the virus writers counter with a more sophisticated and Virus scanner wrlters are engaged a game: a tO see whether the good guys can make use Of the same techmques. Virus writers that hackers have invented and successfully used tO fOil security researchers, and worms, trOJans, and rootkits from detection. lt is entertaming tO examme techniques tacular successes: Obfuscation is used by malware writers tO protect their however, it's in black hat code scenarlos where obfuscation has had its most spec- real-world problems that, to date, only obfuscation is able to tackle. UnfortunatelY, 1.4 け .2 Obfuscating Viruses As you will see in this book, there are lmportant Electoral college has convened and cast their votes)? already been selected (or, ⅲ the case 0f the United States' presidential election, the has been penetrated and any irregularities identified, the next American ldol has Might it be possible to provide enough confusion SO that by the time the obfuscation if the techniques used ⅲ Listing 13 were combined with those ⅲ Listing 1. し 、 1 real-world voting system comprising hundreds 0f thousands 0f lines 0f code? What program. Now, how long would it take for you to find a potential problem in a answer in footnote を time yourself tO see hOW long it tOOk tO analyze this 58-line been deliberately manipulated t0 favor a particular candidate? Before reading the can you tell whether the program produces a fair count or not, or has it, in fact,

Surreptitious software obfuscation watermarking and tamperproofing for software protection


604 Software Similarity Analysis Algorithm 10.1 Sketch 0f algorithm for clone detection and abstraction. P is a program, ー ろ お e る 0 / ノ is the minimum similarity between two segments of code to consider them clones, ア ″ ツ ze is the size Of a COde tO abstraction. DETECT( P, 坊 尾 紡 0 な , z ツ の : 1. BuiId a representation 印 0f p from which it is convenient t0 find clone parrs. collect code pairs that are sufficiently similar and sufficiently large tO warrant their 0 从 n abstraction: ぐ ー の ′ ep ー convenient repreSentatiOn Of for every pair 0f code segments / , g 尾 2 , / / g d0 if similarity(), g) > 尾 紡 0 な & & 立 2 ( / ) ~ 豆 size(g) ~ ツ 立 then た ← 尾 ゞ U く / , g 〉 2. Break out the code pairs found in the preV1011S step intO their own function and replace them with parameterized calls tO this function: for every pair 0f code segments く / , g 〉 尾 d0 る ) く 一 a parameterized version 0f / and g P ← P U ん い replace / with a call t0 ( 自 ) and g with a call t0 る ( お 2 ) う . Return 尾 ゞ , P. specialization process. TO make the copied code fit in with its new envlronment' it's common for the programmer tO rename variables and tO replace literals with new values. More unusual are 1 a ] or structural changes such as removing or adding st atements. This book is about protecting the intellectual property contained ⅲ programs. ・ therefore don't have any particular interest in clone detection per Slnce cloning code is a case Of the programmer "stealing" from himself or from his team- mates, not from another party! However, as you will see' the techniques developed for clone detection are similar tO those used tO detect malicious copyrng of code' and it's reasonable tO believe that the tWO communitles could learn from each other. Also, ⅲ section 7.2.4 ト 4 引 you saw how the skype binary was protected bY adding several hundred hash functions. This protection was ultimately defeated

Surreptitious software obfuscation watermarking and tamperproofing for software protection


Software Similarity An alysis 616 We say that GI is / -isomorphic t0 G2 if For the gr 叩 hs G1 and G2 above' you get 4 7 103 k-gram-Based Analysis With definitions out 0f the way, let's now 100k at actual algorithms ! The remainder of the chapter is organized around different program representations. Some will be simpler than others, some will allow more powerful comparisons between programs' and some support more advanced attack scenarios. First we'll 100k a な g る 4 る & Compar1ng sets ofk-grams 0f two documents is a popular method Of computlng their similarity. This idea has been used for plagiarism detection of text documents and source code, for authorship analysis of code' and for birthmark detectlon of [ 6 , 7 , 引 6 ] , which executable code. ln this section, you'll see Algorithm ssSWA 嶬 な nno 从 ′ is used in the MOSS source-code plagiarism detection system. MOSS is used exten- sively tO catch cheating ln programmmg asslgnments ln computer courses. Algorithm SSMCkgram also uses k-gram-based analyses' but for computingJava byte- code birthmarks. 4 一 6 立 切 ル 前 ァ ( GI , (2) = ー and 0 〃 ・ 〃 バ GI , (2) = WINNOW.• selecting k-gram Hashes 103.1 ス A k-gram is a contiguous length た substring 0f the original document. TO illustrate the idea, consider the short document A consisting Of the string yabbadabbadoo: 0 d a b b a d a Y By sweepmg a window 0f size ろ over A' you get the set 0f う -grams for A: A : yab abb bba bad ada dab abb bba bad ad0 d00 We'll also call these 訪 g な ゞ . Here's a second document c consistlng 0f the string doobeedoobeedoo: 0 b b A : a 0 い ロ e b 0 0 d e e 0 d d 0 b e C : 0

Surreptitious software obfuscation watermarking and tamperproofing for software protection


602 Software Similarity An alysis lmportant library from your program and incorporates it into his own code. This is a serlous threat in the computer gaming 、 vhere libraries for graphics' physics, scripting, and so on are commonly supplied by third-party vendors' and these vendors' revenue is entirely tied tO the licensing fees they can extract from the manufacturers Of the games. ln this chapter, we'll examine four intellectual property protectlon scenarios: 0 ア ル 4 尾 み / ″ ん 4 / 〃 g , ゞ 0 ア 4 尾 . ん 尾 〃 言 , 々 ん gia な ノ zo 〃 , and 0 あ 〃 e ノ ete あ 〃 . ln all four, the essential operation is determming how 立 ・ z お two programs (or pieces Of programs) are tO each Other or whether one program IS 0 〃 / 〃 ノ (partially or in full) within another. ln 0 ア 〃 ) 4 尾 - ん 尾 〃 立 , you want tO determine who wrote a particular program. The idea is tO compare it against a corpus of programs by possible authors and see whose programmmg style is most similar. ln 戸 な g / 4 e i わ 〃 , you want tO determine if a student has copied a program ()n whole or ln part) by comparing it t0 programs from 0ther students ⅲ the class. 0 ア ル 4 尾 房 ツ 彷 4 た / 〃 g is similar tO plagiarism detection in that you want to determine if tWO p rograms sh are the s ame orlgin. The difference is that birthmark detection algorithms typically compare programs at the binary rather than the source-code level and usually assume there is a much more active and potent adversary. C あ 〃 ノ ete け i わ 〃 is a branch Of software engineering that locates similar pieces of code that are the result 0f 0 戸 ツ 4 立 e - 0 ノ 万 programming. This isn't an intellectual property protectlon problem, per but we discuss it here since the algorithms developed for clone detectlon could be adapted for the other scenarios. Some algorithms for identifying similarities in two programs work directly on the source or binary COde. More C01 mon , is tO first convert tO a more convenient such as trees or and then compare ー る e . For this reason, we've divided this chapter intO sections for な g - み 4 ノ 4 〃 4 ゞ な (Section 10. 分 、 616 ) , ス 円 訪 4 ノ 4 加 / な (Section 10.4 第 62 ぅ ) , 〃 - み 4 加 (Sec- tion 10.5 第 、 6 引 ) , g 2 ん 房 化 ノ 4 〃 4 な (Section 10.6 め ) , and 〃 言 - 房 ノ 4 〃 4 を な (Section 10.7 44 ). l-lowever, we'll start bY examining the different applications 0f software similarity analysis (Section 10.1 ) and different possible definitions Of sim- ilarity (Section 10.2 》 , 611 ). we'll conclude with a discussion ⅲ Section 10.8 第 6 ) 2. 10.1 Applications Let's start by lOOking at a few different intellectual property protection scenarios that are all based on being able tO compare tWO or more programs for similarity or containment. TO give you a general idea Of how t001S for clone detection' forenSIC analysis, plagiarism detection, and birthmarking work' we'll give very high-level

Surreptitious software obfuscation watermarking and tamperproofing for software protection


103 k-gram-Based Analysis 619 There 从 ⅱ Ⅱ be approxlmately the same number of hashes as there are tokens in the original document. lt therefore becomes lmpractical to keep more than a small number of them. A common approach is to only keep those that are () mod 2 , for some ル This has the disadvantage of possible long gaps in a document from which no hash is selected. A better approach is to use a technique known as ル z 〃 〃 0 ル z 〃 g. idea is t0 sweep a window Of size ー ′ the Of h,ashes and the smallest one from each window ()n case of ties, choose the rightmost smallest). This ensures that there's no gap longer than Ⅳ + 走 ー 1 between two selected hashes. Here are the windows of size 4 for document A above, where we ve marked the selected hashes in dark gray: 「 ) 4- Ln 】 〕 6 「 / 8 11 っ ) 4 -6 っ ム っ ) 4 「 / っ 乙 っ ) 4 ・ 「 」 -6 「 / 8 一 む つ ん 「 ) 4 ・ 」 マ ′ 8 4 「 」 っ 乙 っ ) The final set of hashes chosen from A then becomes {AI : 2 , A2 : 3 , A6 = 2 , A10 = 1 }. A1go_ rithm 10. う 617 shows a sketch of how to construct the k-grams. 103.2 ス : Software Plagiarism Detection ln software plagiarism detection, you need to do pairwlse comparison of 〃 programs. For this reason, for large 〃 and large programs, performance can be a problem. Algorithm ssSWAMoss handles this by postponing the quadratic step for as long as possible. TO see how this works, let's continue with our example. ln addition to the A document yabbadabbadoo, which has the hashes {AI = 2 , A2 : 3 , A6 : 2 , A10 = 1 } , let's add a B document scoobydoobydoo 0 S 2 0 5 4 b 6 d 7 0 8 0 9 b 12 0 B : with the hashes C d 0 Y Y 0 { B() : 9 , BI : 1 ① , B2 : 11 , B6 : 1 , B7 : 11 , B11 : 1 }