120 When Life is Linear 1 1 い ) Figu re 12.1. A fictional season played between three NCAA basketball teams. A directed edge points from the winning team tO the losing team. The weight of an edge indicates the difference between the winmng and lOSing scores.ln (a), we see a season where transitivity Of scores holds.ln (b), scores are given that dO not match transitivity, which is Often the case ln a season Of data. Additional games would add more rows t0 MI and p ぃ For most sports with a season Of data, MI will have many more rows than columns. There are no values for 自 , 2 , and 物 that will make all three equations simultaneously true. We turn tO least squares which leads us tO multiply both sides ofthe equation by the transpose of MI, in which the rows become columns. SO, 、 VhiCh becomes -1 1 0- ・ 1 0 1 1 0 1 0- -1 1 1 1 0 1 5 ) 0 1 1 ・ 1 0- 1 0 1 1 1 1 0 This system has infinitely many solutions. SO, we take one more step and replace the last row 0f the matnx on the nghthand side 0f the equation with and the last entry in the vector on the righthand side with a 0. This will enforce that the ratings sum tO 0. Finding the ratings corresponds tO solving っ ~ 1 9 一 4 5 ) 1 1 っ ~ 1 っ ~ 1 ( 12.1 )
90 When Life is Linear Figu re 9.5. A color image of a mandrill (a), and then, in (b), after downsampling using the SVD for clustering.lmage by Malene Thyssen (commons. wikimedia. 0rぎ wiki/FiIe:Mandril.Jpg) CC BY 2.5 (creativecommons.org/Iicenses/by/2.5/deed.en). 9.4 LOSing Some Memory Chapter 8 , we saw how eigenvectors can be used to cluster. The SVD can also be used for clustermg. An application 0f clustenng is ⅲ downsampling lmages, WhiCh can reduce the memory necessary tO save a picture. The image 0f the mandrillin Figure 9.5 (a) contains pixels 0f vanous shadings. 、 Me will create a picture with pixels containing only the colors of white and black. Which pixels become white and which become black? Clustermg will decide. There is more than one way to store a color image. 、 Me'll use three matrrces tO store the 代 4 green, and blue channels Of a color image. SO, one pixel will be stored as a tnplet like ( 15 , 255 , 10 ) , which, in this case, 1S predominantly green with some red and a little less blue. TO store the pixel information Of an 〃 x 川 lmage, we'll form an ( 〃川 ) x 3 matr1X T where each row contains the red, green, and blue intensities Of a pixel. Since the elements Of T are all nonnegative, we recenter them pr10r tO clustenng. Since the values range between 0 and 255 , we will subtract 127 om each value. Now, we are ready for the SVD, which can be found for any matr1X. This flexibility in the SVD is important since we have a very long and skinny matrix. Now, we form the SVD and get U ! に The dimensions of U and T
72 When Life is Linear Figure 8.2. An originalimage is repeatedly scaled via a linear transformation. This matrlx's only eigenvalue is え , and any 2D column matrrx is an elgenvec- tor. Figure 8.2 shows the effect of repeatedly applying this transformation with え = 1 / 3 ⅲ (a) and with え = ー 1 / 3 ⅲ (b).ln both figures, we have placed the onginalimage and the images produced by three applications 0f this transformatlon on top Of each other. Knowing the values of え , can you determine which 0f the four images ⅲ (a) and in (b) is the starting image? 8.3 Finding G「0 叩 ies Clustermg is a powerfultool in data mining that sorts data into groups called c / ″、セパ . Which athletes are similar, which customers might buy similar products, or what movles are similar are all possible topics Of clusterlng. There are many ways to cluster data and the type of question you ask and data you analyze often influences which clustenng method you will use.ln this section, we'll cluster undirected graphs using elgenvectors. We'Il see how eigenvectors play a ro 厄 in clustenng by partitioning the undirected graph in Figure 8.3. A graph is a collection 0f vertices and edges. The vertices ⅲ this graph are the circles labeled 1 t0 7. Edges exist between pairs Of vertices. This graph is undirected since the edges have no
What Are the Chances? をを物謇 - 呶内一、納・ 93 物 , イ第物 , をぉ bilities ofgoing om one square tO another. ROW / Ofthe matr1X corresponds Our first step is forming a ″〃 s ″〃〃 ta な , which contains the proba- matnx will be 1 . correspond tO probabilities. This means that the sum Of each row Of every these, we'll use イ am ・んの , 勧ⅲ〃歩 which use matnces ⅲ which the elements moves are necessary for it tO be 50 % likely that someone has won? TO answer algebra. What is the mimmum number Of moves in the game? HOW many We will ask a couple of questions, which will be answered with linear at square 9 tO complete the game. square 8. When are you are on square 8 , you can roll 3 , 4 , 5 , or 6 and land down the snake t0 square 2. If you land on square 6 , you climb the ladder t0 you roll 5 or 6 , move ahead two squares. げ you land on square 4 , you slide and stay on your current square. lfyou roll 3 or 4 , move ahead one square. げ results in winning. On each turn, roll a die. げ you roll 1 or 2 , d0 not move game board in Figure 10.2. The game starts on square Reaching square 9 PermiSSlon. www.hinduismtoday.com/ Copyright 2009 by Himalayan Academy. Reprinted with From 、、 Hinduism Endures: Ⅱ 00 to 50 , ' ' 2009 , Hinduism T0day, Figu re 10.1. Game of snakes and Ladders on cloth 什 om the 19th century in lndia.
Stretch and Shrink Multiplying a vector x by a matr1X results ⅲ another vector, possibly ⅲ a different dimension. Take the 3D vector い 3 ー 2 ] and use it in the product 1 、 ) つん 1 This results in the vector which is in 2D. Some matrlces, when multiplied by a vector, will stretch or shrink the vector. For example, ー 12 2 3 These types Of vectors result in an equation 0f the form Av = え v. Said in words, this type 0f vector v has the property that the matnx 月 multiplied by v results in a scalar multiple Ofv.ln such cases, v is called an ge 川℃ 0 / 0 ′・ Of 月 and え its associated g 夜れ観ん e. The terms elgenvalue and eigenvector are denved from the German word 、、 Eigenwert" which means proper value. The word eigen pronounced 。、 eye-gun. 9 8.1 Getting Some Definition Let's take a closerlook at an eigenvector and eigenvalue. We have 月 v = which indicates that え is an eigenvalue associated with its corresponding 69
124 Connecticut St. Jose h's Connecticut When Life is Linear ViIlanova Milwaukee Milwaukee Figure 12.3. A small portion of the 2 田 4 March Madness bracket. Each vertical line represents a game between tWO teams, the winner Of which is placed on the horizontalline immediately t0 the right. The match 叩 s on the solid lines at the 厄代 are the known games announced at the beginnlng Of the tournament and the teams on the dashed lines show an example Of a fan's predictions for the winner Of each matchup. Often comes in predicting the winners Of each game (dashed lines) tO see who advances t0 eventually win the tournament. By the following Thursday, bracket predictions must be completed and entered in pools in which friends, colleagues, and even strangers compete for prlde, Offce POOI winnlngs, or even thousands of dollars.ln fact, in 2 田 4 , Warren Buffett insured a billion dollar prlze for anyone who could complete a perfect bracket for the tour- nament. By the end 0f the second round ()r 48 games) none 0f the over 8 million brackets had perfectly predicted the 2 田 4 tournament. As the tour- nament progresses, game results either uphOld or overturn the predictions of millions of fans. For instance, for the matchups shown ⅲ Figure 12.3 , Connecticut beat St. Joseph's, which matches the prediction, and Villanova beat MiIwaukee, which goes against the prediction. The actualresults 什 om 2 田 4 are shown in Figure 12.4. There seems little madness ⅲ applying the C011ey and Massey methods t0 creating brackets for the NCAA basketball tournament. A bracket is formed by assuming a higher ranked team wins. But, this results ⅲ everyone who is using the methods t0 have the same bracket, which may not be desired. An adJustment tO the system requires mathematical modeling decisions and results in personalized brackets. TO this end, one follows the work introduced in [ 8 ] and detailed in い 3 ] and decides a weighting for each game. For Connecticut Connecticut St. Jose h's Villanova VilIanova Milwaukee Figu re 12 4. A small portion of the 2014 March Madness tournament bracket results. Connecticut Connecticut
84 When Life is Linear easy. We take the first column of U, denoted 社 the largest singular value (diagonal element) 0f denoted の and the first row 0f 「 , denoted v. Referring to the matrlces U, ! , and 「 found for the matr1X B consid- ered earlier in this sectlon, , の = 21.0244 , and ー 0.6459 ー 0.6359 ー 0.3940 ー 0.1520 TO form the closest rank 1 approximation under the Frobenius norm, we ー 0.6881 ー 0 4064 ]. = [ ー 0.6012 compute BI = 0 ー V. 1.9 幻 6 4.9796 8.0376 8.1641 川 = 2L0244 ー 0.1520 ー 0.3940 ー 0.6359 ー 0.6459 2.1995 5.6997 9.2000 9.3449 1 .2989 3.3660 5.4330 5.5186 [ ー 0.6012 ー 0.6881 ー 0 4064 ] We can venfy that this is a rank 1 matnx by noting that the first row of 川 is v times the product 0fthe largest singular value (which equals 2 L0244 ) and the first element of 社 (which equals ー 0.1520 ). SimiIarIy, the second row of 川 is v times the product of 7.98 Ⅱ and the second element of u (which equals ー 0.3940 ). SO, each row 0f 川 is a multiple ofv and is rank 1. lt is a fact that a matr1X produced by multiplying a column vector by a row vector will be a rank 1 matnx. Can you see why this would be true? The SVD gives us much more than the best rank 1 approximation. lt can a ト 0 give us the best rank ん approximation for any ん , as long as ん is less than or equal t0 the rank 0f our matr1X 工 How do you form it? You'll produce three matnces. The first matnx is the first ん columns of U. The second matrIX ! ん is a submatrlx formed from the first ん rows and ん columns of ン which forms a ん x ん matr1X. Then, equals the first ん rows of に Our rank ん approximauon is 嫁 ! ん . Let's practice again on the SVD 0fthe matr1X B. The rank of B is 3 , so we an ask for, at most, a rank 3 matnx, which would equal 召 . Let's produce
82 When Life is Linear matrlx. Let'S consider a matr1X that has more rows than columns: 1 4 5 6 7 10 Ⅱ 0 Performing the SVD on a matrIX returns three matrlces, which are denoted U, ! , and に For 召 , our SVD calculator would calculate ー 0.1520 ー 0.3940 ー 0.6359 ー 0.6459 2 L0244 0 0 ー 0.6012 ー 0.3096 ー 0.7367 0.2369 0.3626 0.4883 ー 0.7576 0 7.9811 0.8684 0.2160 ー 0.4365 0.0936 0 0 and 0 0.5255 ー 0.68 ー 0.2682 0.6742 ー 0.4064 0.9122 ー 0.0518 The product U ! 「 will equal 既 where U is a matr1X with the same number of rows and columns as B and 「 and ! will have the same number of columns as U. Further, 「 and ! have the same number ofrows as columns. iS a / 4g0 〃 4 / 川 4 な , WhiCh means it only contains nonzero values on the diagonal, which are called the 豆〃 g ″ / values. We will assume that the singular values 0f are ordered 仕 om highest t0 lowest with the highest appearmg in the upper lefthand corner 0fthe matnx. lt is interesting that the columns Of U and the rows Of に 7 、 in the SVD are related tO our computations for PCA in Section 8.4. While the matr1X product U equals B, the three matnces can be used tO approximate 召 , but that's getting a bit ahead of ourselves. Before discussing the approximation ofmatrlces, let us think a bit about the approximation Of numbers. What dO we mean when we say a number い a good approximation tO another number? Generally, we are refernng tO distance. For example, 3. Ⅱ 1 1 would be considered a closer approximation to 兀 than 3.0. Similarly, we can define distance for matrlces. One such measure is the Frobenius norm, which is denoted as 日川レ , defined as the square root of
Math to the Max 59 How dO we pick among an infinite number Of choices for 川 and わ ? Somewhat amazingly, we need only t0 multiply b0th sides ofthe equation by MT which simply means we want t0 solve MTMx = M b. For our small system, this corresponds tO solving 20 に 1 1896 1 円 84 ー 2 田 2 1896 円 84 1 2012 1896 1984 1 1 1 9.63 9.99 12 which, after performing the matnx multiplications, becomes 川 6998484 54726 54726 28 or 川 = ー 0.0210 andb = 5L8033. 567962.44 290.84 Let's revisit what makes this the best line. At ズ = 20 ー 2 , it estimates the time t0 be ) , = 2012 ( ー 0.0210 ) 十 5L8033 = 9.5513. The vertical distance from the point ( 20 に , 9.63 ) t0 the line is the distance between ( 2012 , 9.5513 ) and ( 2012 , 9.63 ) , which equals9.63 ー 9.5513 = 0.0787. Forthe year 1896 , the line estimates the time as 1 L9873 , so the error is 12 ー 1 L9873 0.0 に 7. Finally, for 1984,thelinewouldreturny = 10 」 393 , and 川 .1393 ー 9.99 = 0.1493. The sum 0f the squares 0f the distances is ( 0.0787 ) 2 十 ( 0. 田 27 ) 2 十 ( 0 」 493 ) 2 = 0.0286. This line, produced using the method ofleast squares, chose the 川 and わ such that this sum as small as possible.ln this sense, it is the best fit of the data. NO other line will produce a smaller sum. What if we use all the data? This produces the line ァ = ー 0.0133X 十 36. 引 , which is graphed in Figure 7.1. N0tice how close the line is t0 vanous points. For 2012 , it predicts Usain B01t's time at 9.599 , which is only .003 seconds 0 仼 his actual time! Let's 0k at our least squares line more closely. lts slope is ー 0.0133. Remember, . denotes the year and is the time. SO, the slope predicts that every year, we expect the Olympic gold medal time t0 drop by just over a hundredth of a second. At what point would someone reach the limit Of speed? NO one has ever run the race in 2 seconds. SO, there is some limit between 2 and 9.63 seconds. The bOOk, The R 夜 : c 行 0 〃 PO ⅲな considers such a question. lt is written by J0hn Brenkus, the host 0f ESPN 's Sports Science show. Brenkus analyzes four distinct phases Of the race: reacting tO the gun, getting out Of the blocks, accelerating to top speed, and hanging on for dear ⅱ応 at the end.
24 When Life is Linear Ⅱ山一Ⅱ . There is also the x たわ〃川 , which is com- the vector 恒ⅱ 2 = puted as い旧 my preferences become the vector t = [ ー 3 3 5 ー 4 ] , Oscar's is 0 = NOW, we take the ratlngs and create preference vectors. For instance, generator create ratings for all 0f us. The ratings appear ⅲ Table 4. l. not wanting tO see it again. TO keep things general, l'lllet a random number wanting t0 see the film (for the first time or again) and ー 5 means definitely Each person rates the films between ー 5 and 5. A rating of 5 correlates to Award for best picture: 襯夜・ / ca 〃〃 , Gravity, 〃 , and P ん / 0 〃肥〃 4. things simple, we'lllook at films that were nominated for a 2014 Academy Emmy, and compare their preferences for a few movies tO my own. TO keep with similar tastes in movies. Let's Ok at tWO Of my frlends, Oscar and fact, let's see how tO use vector norms in the fourth dimension tO find people The Euclidean norm can be extended intO more than two dimenslons. 4.1 Recommended Movie norm, or another norm Of your choosing. work generally and later decide ifyou are using the Euclidean norm, taxicab norm. ln some cases, you want and need tO be specific. Sometimes, you can This allows mathematics t0 be developed without being specific about a 3. If いⅡ = 0 , then v equals the zero vector. 2. 恒十ⅵミⅡ可十いⅡ , which is called the tnangle inequality. =lalllvll. number , erties. ln particular, for any tWO n-dimensional vectors u and v and real Regardless 0f what norm you choose, the norm will have specific prop- tion that maps vectors in n-dimensional space tO the set Of real numbers. such cases, we can wrlte a general norm by wnting い日 , which is a func- Sometimes, we don't want or need tO specify which norm we'll use. ln of the vector Ⅱ equals 恒旧 Euclidean distance between the points. Using the taxicab norm, the length length ofthe vector is ( ー 3 ) 2 十い ) 2 = 、 / 了 0 which again correlates t0 the under the vector norm you choose. Using the Euclidean norm, the tance between them is the length Of the vector u ーい 4 ] Suppose we are interested in the points ( ー 1 , 5 ) and ( 2 , 4 ). The dis-