嶬で b Design & Development Building the Realtime User Experience The 嶬で b is increasingly happening in realtime. 嶬宝 h websites such as Facebook and Twitter leading the way' users ℃ coming tO expect that Ⅱ sites should serve content as it occurs¯on smartphones as well as computers. This bOOk shows you h ( ) 从 to build realtime user experiences by adding chat' streaming content' and including at 紅 without making big changes tO the existing infrastructure. You'll alSO learn hOW tO serve realtime content beyond the browser. Throughout the b00k are many practical JavaScript and Python examples that you can use on your site 〃 0 . And in the final chapter, you'll build a location-aware game that combines all of the technologies discussed. Use the latest realtime syndication technologY' including PubSubHubbub BuiId dynamicwidgets on your homepageto show realtime updates from sources ■ Learn h0Wt0 use long polling t0 push" contentfrom your server tO browsers create an application using theTornado web serverthat makes sense Of massive amounts Of streaming content Understand the unique requirements for setting up a basic Chat service Use IM and SMS tO enable usersto interactwith your site outside Ofa web browser lmplementcustom analyticsto measure engagement in realtime ■ ■ 第 ■ Previous experience developing web iS recommended. 。 7 〃な 0 〃召グ〃 川尸阨 / ん〃 0 〃召 〃〃肥わ . 爿叩ド C() 〃 7 わ加 es 4 〃 S 2 / 72 / 〃 鰓〃 / ⅳ〃 . 鹿召リ功 / 〃 g 花ノ es な / 加″ g カ川を昭 / 0 励 0 間″側励 c 砠ツ / 娵ル召 ア / / / ツ翫励 / 加雇 —MarshaIl Kirkpatrick 朝一加ら ReadWriteWeb Ted Roden is a member 0f the Research and Development group at 7 カ e Ne 肥 7 カ〃い , wher ℃ he has extensively 1 ℃ searched topics related tO realtime user experience. 0 REILLY 0 0 「 e Ⅲ y. ( om US $ 34.99 CAN $ 43.99 I S B N : 9 7 8 ー 0 ー 5 9 6 ー 8 0 6 1 5 ー 6 5 3 4 9 9 川ⅧⅢはⅢⅧ馴 II 9 / 8 0 5 9 6 8 0 6 1 5 6 Free online edition fo 「 45 days with purchase of this bOOk. Details on last page. Safarl BooksOnline
CHAPTER 5 Taming the Firehose with Tornad0 ln the previous chapter, we built an application that gets the data to users as quickly as possible. However, that application iS limited by hOW fast the authors can post new content. This chapter is about speed and about handling huge amounts of data. We're going t0 build an application that shows offjust how fastthe realtime web can work. We'II be combining one of the best sources of big data that is publicly available with a truly great piece Of software designed for creating realtime web experiences. Using Twitter's streaming APIS, we re going tO build an application tO make sense Of the firehose of information they make publicly available. lnstead of Cometd, this ex- ample will use a web server and framework called Tornado ( 印 : 〃ルルル . rnado ル訪 .org/. Tornado is a set 0f Python libraries that were originally created by FriendFeed ( 印 : 〃ルルル五〃町 d. co 川 ) and open sourced shortly after they were acquired by Facebook ( 印 : 〃ルルル記わ 00 た . co 川 ). Tornado Tornado is a really fantastic piece Of software that is bOth a web framework and a nonblocking web server wntten entirely in Python. The nonblocking architecture of the web server allows it tO scale up tO handle thousands Of simultaneous connections. On top 0f being able t0 handle so many connections, it can also keep them open for long periods 0f time, making it perfectly suited for realtime web applications. The server was built specifically for the realtime aspects 0f FriendFeed, and it enables each user [ 0 maintain an actlve COnneCt1011 tO the server.
the connection [ 0 the client. We dO that by calling finish with the output that we want [ 0 send t0 the browser. Simply supplying a Python dictionary structure to the finish method is good enough because the Tornad0 framework will convert that to JSON automatically. NOW that we've prepared Tornado tO accept and serve an influx Of updates fror れ Twit- ter, it's time tO start collecting them. Twitte(s ReaItime Streaming A 円 5 Twitter has a 10t Of data. Every second, amazing numbers Of tweets are created by users and sent tO Twitter. Users are constantly reporting everything from their latest meal tO firsthand accounts of airplanes landing in the Hudson River. Almost as frequently, and with almost as many different purposes, applications repeatedly ping Twitter's APIs and web browsers pound their servers in hopes Of grabbing the latest content. For a long while, Twitter has offered up fairly complete but also standard API functions t0 developers wanting t0 grab the Twitter feed. For example, if a developer wanted t0 get some tweets from the public timeline, she would make a request tO the public timeline API, which would return 20 or so 0f thelatest tweets. Recently, though, Twitter has started providing streaming APIs t0 developers. SO instead 0f grabbing 20 updates per request, the developer can make one web request and just leave it open. lnstead Of sending out a set number 0f updates, Twitter will keep sending them untilthe developer closes the connection. TWitter Offers several different streaming methods for several different purposes. Each Of these methods requires a certain level Of access by the user. The basic methods can be used by anyone, but if you want the totally full feed, you're going to have [ 0 talk directly with Twitter [ 0 setthat up. For us, though, there's plenty 0f data t0 work with. The different streaming API methods are listed next. statuses/filter This allows you to get a subset of the public feed while filtering out specific key- words. The developer can specify keywords to track and users to f0110W. Anybody can filter up [ 0 200 keywords; more than that requires special permission. statuses/firehose The firehose is the largest feed of all. This is an unfiltered view of the public time- line. This is not generally available to developers and requires a special level of access. statuses/links This returns all of the public statuses containing stnngs that 100k like links. If a tweet has http:// or https : / / , it'll be in this stream. This method is also restricted and reqmres special access. TW 配 5 ReaItime Streaming A 円 5 ー 87
Twitter / Torando Ⅱ十 Ohttp : / ハ“ ho 8888 / C009 に AII ( 630 ) #hashtags ( 5 り @ats ( 304 ) retweets ( 53 ) g links ( Ⅱ 0 ) ⅸ ong : @SONRISARODY なかなか時間あ ・しかしお店が楽しみだ。 わんもんね・ 24 ( ld nCh : RT @jusrelle: ) に s ou ” kids.v.their homecoming is 山 i 、 week but it gets no love cause Of #howardhomecommg ) 可 : What good guys!lm hittin the ice ⅲ the am RT @GambIersHockey: http://twitpic.com/miatx -Green Bay Gamb 厄パ . 、 . #capsfoæancer Sat. Night RenataPintox: http://www.petitionspot.com/petitions/JustinOnOprahx/ SIGN THIS. LETS GET @ ・ stinbieber ON [ 0 記ⅲ 0 第艨 p : / ハ 0 ( 引 host : 8888 广 ( 0n10 厄 d 187 Of 188 ms 日 g Ⅲ℃ 5-7. Data 立尾 a 川ⅲ g 語 ro ″ g 〃 0 Ⅲ・ T ル襯夜・〃〃わ〃 If you 100k at the terminal window where you started the server, you'll notice tons Of logging information is streaming by very quickly (see Figure 5 ー 8 ). Because the data is coming in SO fast and being served out in real time, this log moves very quickly. If you point different browsers atthe URL, you'll see even more logging information. When working with large amounts Of data and concurrent users, you'll want t0 be careful with the amount of logging that your application does. If you were writing this 10g t0 a file, it could easily fill up a hard drive very quickly. Terminal ー zsh ー 88X18 tr ー一い上》 python runner. py 員 2 こ 22 熏ら : 5 ? runne 、 r : 2 ] Request to MainHand ler ! ' ' 32 : 5 第 57 : ? 1 引 2 日 GET / ( 127. 圧日 .1 ) 4.81ms 卩 22 ら 7 睿 1 3 GET /static/twitter. js?v=ff9Ü8 ( 127 . 日 .1 ) .38ms い鰄 2 : : 2 とミ web : 71 2 日 POST /updates ( 127. 目田 .1 ) 97 .aarns 当 2 32 : 5 第旧 r に : を当 ] 6 recent tweets 三 2 23 5 をロ ] 2 日日 POST /updates ( 127. 圧日 .1 ) 1. llms 1 い 2 こを 3 : 56 : ら日 rurtner : 3 」 recent tweets 3 こ 3 它 : 5 第 3ß : ? 1 幻 2 日日 POST /updates ( 127. 目田 .1 ) 225.35m ミ 卩 ユ 2 三 2 、 22 : : 5 ら ! 、い n を : 簽 5 ] 5 recent tweets こりし : 71 2 日 POST /updates ( 127. . 1 ) ら .94m5 い ) 2 : を : 5 : 35 ] 日 recent tweets 卩 091 : : : : 5 ? web 睿 1 勹 2 日日 P ロ釘 /updates ( 127. 日 . 圧 1 ) 653.52 幟 31 23 : 59 [ 「い nr : 「 51 13 recent tweets : 3 3 と : 59 と : 社 2 日 POST /updates ( 127. 圧圧 1 ) 1.43ms 32 当 ? 自 - : 3 1 recent tweets 卩 2 ! 它 2 23 1 暑に : 71 明 2 日 POST /updates ( 127. 工日 .1 ) 676.41ms Figure 5- & Watching 習 og 川お g おカ℃川 0 Ⅲ・ app 〃 100 ー Chapter 5 : Taming the Firehose with Tornado
I mentioned SMS, but the mobile experience does not end there. These days, users have phones in their pockets with full-fledged web browsers that in some cases offer more functionality than their desktop-based brethren. Among other things, mobile browsers can handle Offline data storage, GPS sensors, and touch-based interfaces. Their im- pressive featureset, coupled with nearly ubiquitous wireless broadband, means they cannot be treated as second class lnternet citizens. AppIications and user experiences simply must be built with mobile devices in mind. Push Versus PuII For about as long as the Web has been around, there have been tWO main ways Of getting content tO a user: 〃レ s 〃 and 〃″ . Pull is the methOd in which most interactions have worked—the user clicks a link and the browser pulls the content down from the server. If the server wants [ 0 send additional messages tO the user after the data has been pulled down, it JIlSt waits and queues them up until the client makes another request. The idea behind push technology iS that as soon as the server has a new message for the user, it sends it tO him immediately. A connection is maintained between the server and the client and new data iS sent as needed. ln the scheme of the lnternet, push technology is not a new development. Throughout the years there have been different standards dictating how it should work. Each pro- posed standard has had varying levels ofsupport amongst browser makers and different requirements on the server Side. The differing behaviors and requirements of the two technologies have led many de- velopers tO use one or the Other. This has meant that many sites wanting tO Offer dy- namic updates tO their users had tO resort tO AJax timers polling the site everyX seconds tO check for new content. ThiS increased amount Of requests is taxing on the server and provides a far less graceful user experience than it should have. pushing content out tO the user as it happens gives the user a much more engaging experience and uses far less resources on the server. Fewer requests means less band- width is used and less CPU consumed, because the server is not constantly checking and responding to update requests (see Figure 1-3 ). Prerequisites This bOOk assumes the reader is comfortable with most aspects Of web development. The example code in this text uses Java, JavaScript, PHP, and Python. You are encour- aged [ 0 use the technologies that make the most sense tO you. If you're more comfort- able with PostgreSQL than MySQL, please use what you're more familiar with. Many of the command-line examples assume you re using something Unix-like (Mac OS X, Linux, etc. ) , but most Of this software runs on Windows, and l'm confident that you can translate any commands listed SO that they work in your environment. 4 ー Ch 叩 ter 1 : lntroduction
done, we just 100P through the tags in the XML looking for the link tag with the proper て el attribute. Once we find it, we return the SUP link that we found. TO run this script, run the following command: ~ $ php sup-id-aggregator ・ php Checking fO て SUP-IDs on 30 Atom feeds 1 Found SUP-ID: ( 03654a851d ) 2 Found SUP-ID: ( 033097ff53 ) 3 Found SUP-ID: ( ( deb48b69 の 29 Found SUP-ID: ( 4791d29f89 ) This command may take a little bit of time [ 0 finish because it's downloading and parsing a number of RSS feeds, but once does ⅱ finish, you'll have a bunch of feeds and SUP-IDs saved locally to the database. The time ittakes to download these feeds should help illustrate the beauty of SUP. Without SUP, you would have to download all of those feeds again every time you want to check whether they've been updated. But from here on out, you don't need [ 0 do that. We just need to check the main SUP feed and check the files it tells us to use. Checking the SUP feed Now that we have a good number of Atom feeds and we know their corresponding SUP URLs and SUP-IDs, we can starting pinging the common SUP feed to check for updates. Prior to SUP, if we wanted t0 check all of these feeds for updates, we'd have t0 grab each and every one and compare them t0 the data we have. We'd d0 that every time we wanted to check for new content. With SUP, we simply have ping tell us when things are updated. This process, while fairly straightforward, is a bit more complex than the previous one. SO we re going tO step through it piece by piece. Open your text editor and create a file called sup-feed-aggregator. php: く ?php include once("db. php"); $sup u て 1 = "http://enjoysthin ・ gs/api/generate. sup?age=60" ・ $sup data = @json decode(@file get_contents($sup_url)); if( !$sup data) die("Unab1e t0 load the SUP URL: {$sup_url} Getting started is simple; we just need t0 download the SUP file used by all 0f our feeds. Normally, you'd want to check the database and get all of the SUP files needed for the data feeds you need [ 0 check, but since all of our feeds are coming from the same spot, I removed that complexity. We just download the file and use PHP's j son decode func- tion, which builds it intO a native PHP Object. PHP provides json decode and json_encode, two of the most useful functions in the language for dealing with web services and data ex- 4. 、 change on the lnternet. If you're not familiar with them, you should seriously consider giving them a 100k. 18 ー Ch 叩 ter 2 : ReaItime Syndication
CHAPTER 2 ReaItime Syndication lnteracting on the realtime web involves a lOt Of give and take; it's 1 れ ore than just removing the need t0 refresh your web browser and having updates filter in as they happen. Acquiring content from external sources and publishing it back also must happen in realtime. On the 嶬第 b , this is called リれ浦 ca 行 0 〃 , a process in which content is broadcasted from one place tO another. Most syndication on the Web happens through the transmission of XML files, specif- ically RSS or Atom, from the publisher to the consumer. This model has always been fairly simple: a publisher specifies a feed location and updates the content in that file as it's posted tO the site. Consumers Of this content, having no way Of knowing when new content is posted, have tO check that file every half hour or SO tO see whether any new content has arrived. If a consumer wanted the content faster, they'd have tO check the feed more often. However, most publishers frown upon that type of activity and specifically prohibit it in their terms Of service. If [ 00 many consumers start download- ing all Of the feeds on a site every minute, it would be very taxing on the server. Although this has been a problem on the Web for as long as RSS feeds have been around, only recently have people put serious effort into fixing the issue. There are a good number 0f competing standards aimed atsolving this problem. Each of these solutions has had varying degrees Of success in getting sites tO adopt their technologies. We're going t0 focus on two 0f the bigger winners at this point, SUP and PubSubHubbub, but it's worth acknowledging the other standards. 9
last ID that we receive from FriendFeed and then grab everything after that on the next time through. FriendFeed updates get reinserted and reordered when another user comments or performs any Other action on the update. lfwe stopped adding lines after finding one that we've seen, we'd end up missing a good deal of updates. This method should clear out ()ld entries after a certain point instead of caching them indefinitely, but that is left as an exercise for the reader. AS we IOOP through each entry, we increment a counter variable. This variable is used to keep track of exactly how many updates are added [ 0 the widget. Once the 100P is finished, we update the title of the widget with the new count using the Home. update Tit1e method. Once again, we have a finished widget. Open up index. html in your browser to check ' ①し十」れに / 〃 U “ / od を n / D 「 opbox / book 、・ ( od を / horn に p ⅲ x. htn index.html the results. lt should resemble Figure 3 ー 2. F 「 dF “社 Upd い 畆「 n bOi( 扣 teona con 、 piratier DevTeam este Apple ・Ⅱ p 版亜誕 ) ( 坦止 = 。、 AIex 翫 ob に No change yet ⅲ energy にⅣ el 、 a れび ICK)K ofVitamin D and aweekof naturopathic supplements David lng: lco«)mment/davidingl C に 3 に d Ⅱ minutes ago la 、ロ comnwnts,out 0f3 daviding.. h 叩 : 心 y 優 B 、型ー David lng: [ccx:omment/davidingl Created minutesago ね s ロ ( ommen ト . 0 盟 0f2 dav ⅶ ng ー加とロ立』 ( 乙 [ Tim Yonkers: Valve: li 、 tening いⅨ : g 田 uent Ⅱに、聞 c い s ・ 取物獰加 4 、 0 m ⅲ L ag い ⅸ川 0 血 0 な 0 ロ・ Like C に u Feature?Then youtll dig us too! Forget New Mcxyn.' S 内面 Black Friday listening いい on h p : ″ my 、 pace. C01 れ飾 ou “ 0 0 0 パ mu 、に SarahLoveIy: Thank 、 giving ”、ⅱ V ⅱにゞ ) n 第 OW , we have 仙沿 y yummy desserts! Fnday going black friday 、 hopping with Amy and 紀 n New Moon :D Ⅱ g kn 、、、 , : If you havenit ℃ n y Mo 、 , に・ > Up - > http://cli.gs/VGmmIrBIack Friday New M(X)n GoogIe Wave れ川「 a れ社 2W0 取 a :I fO 「 got about Luraunt. He•s 国 k. ButIdon'tthink 、 pa ⅸは .1 ey never 由 owed the bad し vampires sparkle ⅲ TwiIight 0 「、 ew Moon を講 " ⅲ Lee: MTV lndonesia Awards: On 山ら , 日 g レ尾 3-2. The T ル襯 trending ん元〃 dF d widget 川〃ⅲ〃 g LiveImages moment. TO get this bit started, once again we'll need tO modify the Home. init method. going [ 0 limit it tO a realtime view ofimages that are streaming intO TWitter at any given Until now, the widgets primarily show text updates in realtime. so in this widget we're Adjust yours [ 0 look like the following: / / Get everything started Home. init = function() { / / load the trending topics from twitter Home. appendJS(' http://search.twitter.com/trends.json?ca11back=Home.catchTrends Livelmages ー 51
class TweetFirehose(threading. Thread) : import u て 11ib2 import base64 def run(self) : 'http://stream.twitter.com/l/statuses/sample.json' status sample url request = uて11ib2. Request(status sample url) # Be sure to use your own twitter login information = base64. encodestring('USERNAME:PASSWORD' ) [ : -1 ] auth basic request. add header('Authorization' Basic %s' % auth basic) # open the connection = ur11ib2. urlopen(request) firehose for tweet in firehose: if len(tweet) 〉 2 : tweetQueue. put (tweet) firehose. close() You can see this creates a class called TweetFirehose. This class is defined as a subclass of Python's Thread class and defines only one function: run. ln this function we start out by building an HTTP request t0 Twitter using ur11ib2. At the moment, the only method of authentication t0 the streaming APIs is via HTTP Basic Auth. TO do that, we JOin our username and password together With a COlon, encode it using Base64 encoding, and add the Authorization header [ 0 the HTTP request. Having built the request, the only thing left t0 d0 is open it. If you've ever used url 1ib2 to download a file from the Web, this works a 10t like that but with one key difference: it doesn't end. After calling ur11ib2. urlopen with our request, we'll start receiving tweets one line at a time, each one encoded intO JSON. The following iS an example tweet as received from the streaming API feed: {"text":"Hah! I just in reply t0 status id" :false, "truncated" :false} ln the code, we Just start looping through those lines as fast as we get them. Assuming we don't get any blank lines, we add them t0 tweetQueue that we created at the top 0f the file. We pretend to close the handle to the firehose, but in all honesty, the code will probably never getthere. That's all the code ittakes to stream the data in from Twitter in realtime. AII it really takes is making one requestto their streaming URL and finding some place [ 0 put all the data. ln our case, the place we found t0 putthe data is a simple queue. lt's up to another thread [ 0 pull the tweets off of the queue and process them. TO actually process these tweets, we re going tO create another thread in the same way we created the previous one. Underneath the TweetFirehose class, add the following COde: class TweetProcessor(threading. Thread) : import re import simplejson as json Twitteds ReaItime Streaming A 円 5 ー 89
Upon receiving a subscription request, the hub will try tO verify that the server at the end of the callback URL is actually trying [ 0 subscribe to the feed. To do this, it makes a verification request to the callback URL by POSTing a challenge string. If the sub- scriber actually meantto subscribe [ 0 the feed, it will verify the request by echoing the challenge request back to the server in a normal HTTP response. If the subscriber wants t0 deny this verification request, it should return an HTTP 404 。、 N0t found" response. The parameters that will get posted to the client are the following: hub. mode The mode will be either subscribe or unsubscribe. hub. topic This is the topic URL that was supplied when initiating the subscription request. hub. challenge This is the random string generated by the hub. If the subscriber verifies this re- quest, it must be echoed back in the body 0f the HTTP response. hub. lease seconds This is the number Of seconds the subscription should remain active. If the sub- scriber wants uninterrupted pings, it should resubscribe before this periOd Of time runs out. hub. verify t0ken If this field was supplied during the main subscription call, it will be present in this field. If it's wrong, or supplied but unexpected, the subscriber should reject this verification request. PubIishing content Publishing content, or notifying the hub of new content, is actually the simplest part of the whole protocol. When the publisher has new content and wants to inform the hub, it pings the hub with the URL. NO authentication is needed, because it's only pinging the hub t0 suggestthat it check for new updates. There are two fields in this request: hub. mode The mode will always be publish during the publish notification. hub. topic This is the topic URL that has been updated. After this ping, the hub server will check the topic URL for any updates and notify subscribers if needed. After receiving the publish ping, the hub server will make a GET request t0 the topic URL. This is where the hub gets the new Atom/RSS content. This request is a totally standard GET request, except that it's accompanied by the header field x-Hub- Subscribers. PubSubHubbub ー 29